Você está na página 1de 108

A prediction based model for forex markets combining

Genetic Algorithms and Neural Networks

Rui Miguel Hungria Furtado

Thesis to obtain the Master of Science Degree in

Telecomunications and Computer Science Engineering

Supervisor: Prof. Rui Fuentecilla Maia Ferreira Neves

Examination Committee
Chairperson: Prof. Ricardo Jorge Fernandes Chaves
Supervisor: Prof. Rui Fuentecilla Maia Ferreira Neves
Member of the Committee: Prof. Aleksandar Ilic

October 2018
ii
To my family and friends

iii
iv
Acknowledgments

First I would like to thank my supervisor, Prof. Rui Neves, who provided weekly support throughout the
thesis development without being extremely strict about the path needed to achieve the established goals. He
completely allowed thesis to be my own work, but guided me through the right direction whenever needed.
Second, I want to thank my colleagues that gave me the strength and advises to finish this work. It was a
pleasure to share with you this academic journey that now comes to an end.
Finally I want to thank to my family and closer ones. Without your patience and unconditional support this
thesis would not be possible.

v
vi
Resumo

Investir em mercados financeiros é sempre uma tarefa complexa e incerta. De modo a aumentar as
pequenas possibilidades de obter uma rentabilidade que ultrapasse o ı́ndice de mercado, os investidores
recorrem a uma série de técnicas que tem como objetivo tentar determinar futuros pontos de entrada e saı́da
do mercado.
Esta tese propõe um sistema de trading otimizado para o Foreign Exchange Market, normalmente con-
hecido por FOREX. Para desempenhar tal tarefa é usada uma Feedforward Neural Network (FNN), que re-
cebe como input um conjunto de indicadores técnicos (TI) calculados a partir de dados históricos de mercados
FOREX, com uma amostragem horária. O sistema foi estruturado seguindo uma metodologia de Supervised
Learning para criação das target variables, convertendo retornos horários em um sinal binário, transposto
para um problema de classificação. De modo a obter o melhor conjunto de parâmetros usados para gerar os
indicadores técnicos e os hiper-parâmetros da rede neuronal, foi desenvolvida uma Estratégia Evolutiva (ES),
baseada num Algoritomo Genético (GA), dado que fazer uma busca exaustiva usando pelo espaço de resul-
tados iria conduzir a um tempo de espera demasiado grande. O Algoritomo Genético providencia também um
processo automático de Feature Selection, de modo a seleccionar apenas as features mais relevantes.
O sistema proposto foi testado com dados históricos de 5 diferentes mercados, de modo a serem tes-
tadas diferentes condições de investimento. As estratégias produzidas são posteriormente avaliadas contra
estratégias de investimento clássicas. Os resultados obtidos demonstram que esta abordagem é capaz de
superar a estratégia Buy & Hold (B&H) no mercado GBP/USD, alcançando um resultado médio de 14,19%
de retorno de investimento (ROI), contra 10,69 % de ROI para B&H. O sistema também superou a estratégia
Sell& Hold (S&H) para o par USD/CHF, alcançando um resultado de 4,45% de ROI contra 4,09% para o S&H.
É também discutido o uso de Batch Normalization como técnica de pré-processamento durante o desenvolvi-
mento de cada estratégia de mercado.

Palavras-chave: Algoritmos Genéticos, Aprendizagem Profunda, Aprendizagem Automática, Otimização


de Funcionalidades, FOREX, Análise Técnica

vii
viii
Abstract

Investing in financial markets is always a complex and difficult task. To raise the small chances of beating
the market, investors usually rely on several techniques that attempt to determine the underlying trading signal,
and hopefully predict future market entry and exit points.
This thesis proposes a trading system optimized for the Foreign Exchange Market, widely known as
FOREX. To perform such task, we use a Feedforward Neural Network (FNN), that take as input features a
set of technical indicators (TI), calculated using FOREX hourly data. A supervised learning approach was
considered to create the target variables, converting hourly returns into a binary trading signal, suitable for a
classification problem. To get the best combination of parameters used to generate each TI and FNN hyper-
parameters, we deployed an Evolutionary Strategy (ES) based on a Genetic Algorithm (GA), since making
an exhaustive search through the entire feature space would be an unfeasible task. The GA also deploys an
automatic Feature Selection (FS) mechanism that enables the FNN to use only relevant features for the given
problem.
The proposed system is tested with real hourly data from 5 different markets, each one exhibiting different
behavior during the sampled time. The produced investment strategies are compared with classical trading
strategies for the sake of comparision. The achieved results show that this approach is capable to outperform
the Buy and Hold strategy (B&H) in the GBP/USD market, achieving an average result of 14.19% of Return
of Investment (ROI), against 10.69% of ROI for B&H. The system also outperformed the Sell&Hold (S&H)
strategy for the USD/CHF, achieving a result of 4.45% of ROI against 4.09% for S&H. Furthermore, it is also
discussed the usage of Batch Normalization as preprocessement technique during the development of each
market strategy.

Keywords: Genetic Algorithms, Deep Learning, Machine Learning, Feature Optimization, FOREX,
Technical Analysis

ix
x
Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Main Contribuitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 5
2.1 Market trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 FOREX market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Financial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Technical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Trend following TI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.5 Momentum oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.6 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.7 Other indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.2 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 Related works on Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.2 Related works on Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Implementation 35
3.1 Model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 User input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

xi
3.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Feature calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.1 Population generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.2 Model creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.3 Fitness computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.4 GA operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Model prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Market simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Results 49
4.1 FOREX Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.1 Data statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1 Classification metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2 Financial metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Case study A - Simple prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.1 Classification results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.2 Market simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5 Case study B.1 - Accuracy as fitness function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.1 Classification results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.2 Market Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.6 Case study B.2 - ROI as fitness function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.1 Classification results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.2 Market Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7 Case Study 3 - Further investigation on profitable markets . . . . . . . . . . . . . . . . . . . . . . 64
4.7.1 Benchmark comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.7.2 USD/CHF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7.3 GBP/USD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7.4 GBP/USD without Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.7.5 Feature selection results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.7.6 Fitness evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.7.7 Topology evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Conclusions 81
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Bibliography 83

A Topology evolution plots 87

xii
List of Tables

4.1 Summary of market indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51


4.2 Summary of market returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Distribution shape descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 System parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 Classification results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Classification results with Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Financial results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.8 Financial results with Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.9 Classification results with ACC fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.10 Financial results with ACC Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.11 Classification results with ROI fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.12 Financial results with ROI fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.13 USD/CHF strategies comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.14 GBP/USD strategies comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.15 GBP/USD without BN strategies comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.16 Selected TI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.17 Solutions architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

xiii
xiv
List of Figures

2.1 Currency pair EUR/USD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7


2.2 time series plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 SMA, EMA and HMA plot with time frame of 20 days . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 MOM plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 MACD and their 3 signals plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Bollinger Bands plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Double Smoothed Stochastic plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Perceptron architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.9 Feedforward Neural Network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.10 Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.11 Binary 4 gene chromosome representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.12 Tournament selection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.13 Crossover operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.14 Genetic algorithm flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1 System workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36


3.2 Raw input data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 SMA csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Optimization layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Chromosome structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6 Final prediction pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7 Market simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1 EUR/USD market index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50


4.2 GBP/USD market index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 GBP/JPY market index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 USD/JPY market index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.5 USD/CHF market index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.6 Non-normalized test ACC vs Normalized test ACC . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.7 Non-normalized test ROI vs Normalized test ROI . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.8 Test ACC vs Test ROI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.9 Maximum Drawdown for GBP/USD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.10 Best, average and worst system individuals for GBP/USD . . . . . . . . . . . . . . . . . . . . . . 64

xv
4.11 Best, average and worst system individuals for USD/CHF . . . . . . . . . . . . . . . . . . . . . . 64
4.12 USD/CHF strategies evolution over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.13 USD/CHF market entry points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.14 GBP/USD strategies evolution over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.15 GBP/USD without BN strategies evolution over time . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.16 GBP/USD without BN market entry points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.17 GBP/USD histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.18 USD/CHF histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.19 USD/CHF and GBP/USD number of features over generation . . . . . . . . . . . . . . . . . . . . 73
4.20 GBP/USD box-and whisker plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.21 USD/CHF box-and whisker plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.22 USD/CHF roi vs gen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.23 15 most used number of neurons in GBP/USD 1st FNN layer . . . . . . . . . . . . . . . . . . . . 77
4.24 15 most used number of neurons in GBP/USD 2nd FNN layer . . . . . . . . . . . . . . . . . . . . 77
4.25 15 most used number of neurons in USD/CHF 1st FNN layer . . . . . . . . . . . . . . . . . . . . 78
4.26 15 most used number of neurons in USD/CHF FNN 2nd layer . . . . . . . . . . . . . . . . . . . . 78

A.1 Evolution of the number of neurons in GBP/USD . . . . . . . . . . . . . . . . . . . . . . . . . . . 88


A.2 Evolution of the number of neurons in USD/CHF . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

xvi
Nomenclature

Optimization and Computer Engineering Related

ACC Accuracy

DL Deep Learning

EC Evolutionary Computing

FNN Feedforward Neural Network

GA Genetic Algorithm

ML Machine Learning

NN Neural Network

Investment Related

AA Aroon

ADX Average Directional Index

ATR Average True Range

BB Bollinger Bands

CCI Commodity Channel Index

CMO Chande Momentum Oscillator

DPO Detrended Price Oscillator

DSS Double Smoothed Stochastic

EMA Exponential Moving Average

EMH Efficient Market Hypothesis

HMA Hull Moving Average

KURT Kurtosis

MACD Moving Average Convergence Divergence

MDD Maximum Drawdown

xvii
MOM Momentum

PO Price Oscillator

ROC Rate of Change

ROI Return of Investment

RSI Relative Strength Index

SKEW Skewness

SMA Simple Moving Average

STD Standard Deviation

STV Standard Variance

TA Technical Analysis

TI Technical Indicator

xviii
Chapter 1

Introduction

Computational Finance has been growing over the years. The latest progresses in the Machine Learning
(ML) area and the increase of complexity in financial securities, backed up with a huge growth in computer
processing power, have boosted up the area of quantitative analysis. This field continuously tries to explain
the behavior of financial markets through complex mathematical measures and calculations, normally using
techniques such as stochastic calculus, statistical models, and data analysis.
Quantitative analysis tries to represent a given reality data into numerical values, with the goal of reducing
investment risk while generating the highest possible profit. The potential and diversity shown by such methods
lead to the creation of powerful systems capable of providing promising results in forecasting how markets will
react in a relative near future. These systems are often used by large firms. But since they do not reveal
how they are performing nor what results they are achieving, this domain remains in a highly competitive and
private environment. Although it is undeniable that this is a field with an outstanding potential for improvement,
the activity of forecasting the market is and will always be considered a controversial activity.
There is a well established community of critics who claim that the market cannot be predicted, and all
efforts to accomplish this goal are useless due to the randomness associated with market variations. In fact,
many economists say that the market is completely unforecastable. For example, in his 1973 bestseller “A
Random Walk Down Wall Street”, Burton Malkiel stated that ”a blindfolded monkey throwing darts at a news-
paper’s financial pages, could select a portfolio that would do just as well as one carefully selected by experts”
[1]. The Random Walk Theory (RWT) affirms that stock prices are completely random, making it impossible to
outperform the market. This randomness is explained by the Efficient Market Hypothesis (EMH), which states
that financial markets are efficient, and that prices already reflect all known information concerning a stock [2].
This implicitly states that trends and patterns observed in past data are not correlated with future outcomes,
and the occurrence of new information is apparently random as well.
This is in direct opposition to Technical Analysis (TA), which claims that a stock’s future price can be forecast
based on historical information, through observing chart patterns created by Technical Indicators (TI). This is
a vast set of mathematical formulations based on past prices, widely used by active traders in order to create
educated guesses about future market trends, identifying suitable entry and exit points in the market. We
directly apply this classic trading methodology to our model, to empirically prove that the EMH is not completely
right. The hypothesis is that if people using this type of procedure can consistently beat the market, then ML
methods backed up by Evolutionary Computation (EC), such as Genetic Algorithms (GA) used in this work,

1
should also be able to reproduce an identical behavior in an automatic way, avoiding the need of manual labour
in defining a trading strategy.
However, there is always a certain degree of randomness associated with every market, and that trait could
never be dissociated from them. The market is and will always be a noisy, non-stationary, and non-linear
system.

1.1 Motivation

The Foreign Exchange Market (FOREX) is the global market for currency trading. It is considered as the
largest and most liquid financial market in the world. The reason behind its current and increasing popularity
can be easily explained by the vast amount of benefits associated with this type of market (section 2.1.1). The
forecasting theme in financial markets has become a major interest topic to investors, used as a measure to
secure and manage their portfolio’s ”health”. Therefore, the main motivation of this work, is to create a suitable
system for the FOREX market, that is capable of providing the best possible return of investment (ROI).
It is intend to study if Deep Learning (DL) is a suitable tool to deal with Time Series (TS) analysis, exploring
its adaptability to different FOREX markets, forecasting hourly returns transformed into binary classifications.
We also want to identify if evolutionary techniques such as GA are a good feature selection and optimiza-
tion tool, minimizing the used features to a number that is capable to extract the maximum performance out of
each produced Feedforward Neural Network (FNN) model, at the same time that some of its internal hyperpa-
rameters are optimized by the GA as well. An important point in the optimization process, is also to determine
if the accuracy achieved by the deployed model is correlated with the achieved ROI, i.e, if an optimized FNN
with the highest possible accuracy is also capable of providing the highest ROI.

1.2 Main Contribuitions

This thesis tries to contribute and advance the field of Computational Finance, exploring and combining
the above mentions techniques to a set of different FOREX markets. Therefore, the main contribuitions of this
work were the following:

• The application of DL algorithms such as FNN to analyze FOREX markets, which allow the usage of
bigger volumes of data that condense information to a higher level of granularity (for example hourly
records instead of daily records);

• The usage of novel DL procedures to tweak and optimize the FNN data processment;

• The usage of GA as feature selection tool, deciding the best TI features for the system, at the same time
that individual feature parameters are optimized as well;

• The integration of different fitness functions, accuracy and ROI, and the study of in which markets the
improvement of one metric can be related to the improvement of the other one;

2
1.3 Goals

By combining the above mentioned Machine Learning techniques with trading knowledge, the goal of this work
is to create a forecasting software that can offer a sufficiently accurate market prediction, ultimately leading
towards the creation of a trading strategy that minimizes the risk involved in market investments. To achieve
this objective, the project intends to fulfill the following requirements:

• Explore different FOREX markets, studying their availability for having trading profit;

• Use TA as feature generation tool;

• Explore the suitability of FNN for time series analysis;

• Use market returns transformed into binary labels, as model target;

• Explore the potential of using a GA for technical indicator parameter optimization;

• Explore the potential of using a GA for FNN hyperparameter optimization;

• Select an optimal set of TI, through the use of GA feature selection;

• Forecast one hour-ahead binary market returns;

• Use predicted outputs to create a trading strategy;

• Compare the performance of the created strategy with the traditional trading strategies used by traders;

1.4 Document structure

The remainder of this document is organized as follows:

Section 1: Introduction to the theme of the thesis and used methodology.

Section 2: Description of some background relevant for the work, with respect to financial and technological
aspects relevant to the theme.

Section 3: Describes all the related work useful for the development of this project, divided by works related
with FOREX, FNN and GA applied in financial domains.

Section 4: Proposes a solution combining all the techniques mentioned in the previous chapters.

Section 5: Introduces evaluation metrics for each step of the created model and presents the obtained results.

Section 6: Finishes the work with a conclusion and recognizes the potential for new developments in this field.

3
4
Chapter 2

Background

This chapter provides an overview of the most important concepts related to this thesis. We divided this
section in 3 main domains related to the subject: Market trading, Machine Learning and Evolutionary Compu-
tation.
In the financial part, we take a closer look to the history of market trading, and what type of techniques
are commonly used to analyze market evolution. After that, we dive into the ML concepts, specially into DL,
a subset of ML, created to deal with sizable batches of data, which is the case of Neural Networks (NN). To
conclude this part, we introduce the chosen prediction model, a FNN, and all the details and technical insight
that compose this complex algorithm. Finally, we present EC, a family of algorithms for global optimization
inspired by biological evolution, where GA belongs to. It is also important to state that the focus of this thesis
is not the financial world per se, but rather, the combination of trading with the computational domain.

2.1 Market trading

To develop a better understanding on the subject addressed by this work, it is important to give the reader
some clear notions related to the stock trading system and the financial world. The stock markets are cur-
rently one of the most important parts of today’s global modern economy, since they influence the economic
development of several countries [3], having the ability to create new opportunities for business growth [4], and
frequently considered a valuable indicator of how healthy is a specific country economy. A trading market is
essentially a place where buyers and sellers come together in order to exchange different types of financial
products. Different products, are generally traded in different markets. The most popular type of market among
all the available ones, is without any doubt, the stock market. In it, people trade shares of publicly traded com-
panies, providing to the buyer a ownership in a corporation. Other important markets worth to be named, are
for example the bonds market - where buyers loan money to an entity at a fixed interest rate, and currencies
market - where currency pairs are traded, which is the case of the FOREX market.
Despite the differences among markets, the actions performed at each market remain equal. When trading
an investor can assume three different positions: long, short and neutral. A long position simply indicates an
order for performing a security purchase, hoping that its price will increase. Once the price rises the investor
has to make the decision of selling the purchased positions, or keeping them, in order to create even more
profit. There is always the risk of price falling, but it is always limited to the amount invested. Alternatively, a

5
short position in the market represents exactly the opposite. The investor borrows a security from a broker, and
sells it immediately at the current market price, with the expectation that its value will decline. When the price
decreases, the investor repurchases the same shares and returns them to the broker, making profit out of the
created price difference. This action substantially increases the trading risks, and could magnify the potential
losses if prices increase. Finally, a neutral position is usually taken when the investor thinks that a security is
neither going up or going down, therefore preferring to stay out of the market.
Market positions should be taken according to the current market trend [5]. There are three possible
market trends: bull, bear and sideways. A bull market is a market where prices are continuously increasing or
are expected to increase. It could be characterized by constant uptrends which reflect an optimistic approach
relatively to securities growth. This raises an opportunity for successful long positions. On the other hand, bear
markets are purely the opposite. They reflect pessimism relatively to prices growth, and raise an opportunity
for successful short positions. A sideways trend represents a constant up and down movement, with prices
bouncing around a given range.

2.1.1 FOREX market

The FOREX market is the buying and selling market for currencies. FOREX is one of the largest and fastest
growing market in the world, with average daily turnover reaching nearly 5.1 trillion dollars per day in April 2016,
according to the 2016 Triennial Central Bank Survey of FX [6]. This increasing popularity is related with some
relevant aspects:

• Trading takes place 24 hours a day, 5 days a week. There are 4 FOREX markets: London, New York,
Sidney and Tokyo. Every market opens 8 hours per day, starting at different hours due to time zone
differences. This means that a trader can easily change from a market to another when the first one
closes.

• It is not necessary to invest in a specific company or sector. The only choice needed to be made is the
currency pair that is going to be used during the trading period.

• FOREX can remain profitable even in the worst times because currencies are always changed in pairs.
When a value of a currency declines, inevitably another currency value is going to rise up.

• The entry costs are extremely low when compared to other well known markets.

Currency pairs: Cross Currency Pairs available in FOREX can be divided in three types: majors, crosses
and exotics. The most important pairs are included in the majors group: EUR/USD, GPB/USD, USD/JPY,
USD/CHF, USD/CAD, AUD/USD and NZD/USD. Crosses do not include the U.S dollar and are ideal for a di-
versified portfolio. The exotics group includes pairs from developing countries, so they are quite illiquid with
very high spreads.

Currency pair quotes: It is really simple to understand what a determined quote stands for. In fig 2.1
we can see an example of a EUR/USD currency pair. The first acronym represents the base currency and
the second one the quote currency. The value given to the pair, represents how much one unit of the base
currency, is worth when converted to the quote currency. For example if a EUR/USD pair as the value of 1.5 it

6
means that one Euro(EUR) worth 1.5 US dollars(USD).

Figure 2.1: Currency pair EUR/USD

PIP & Spread: PIP stands for Price Interest Point and it represents the smallest possible change that
an exchange rate can make. For currency pairs that include the US dollar in it, a PIP is 1/10000 of a dollar
whereas when the rate includes the Yen(JPY) a PIP is just 1/100. For example, a 5 pip spread for EUR/USD is
1.1530/1.1535. Having said that, we can divide a cross currency pair in two parts. The first one is the buying
price and the second one the selling price. In the example stated above the buying price is 1.1530 and the
selling price is 1.1535, meaning that 1.1530 is the price that you have to sell this currency to the broker, and
1.1535 the price that you have to pay if you want to buy this currency to the broker. The concept of spread is
simply the subtraction of the selling price and the buying price. This represents the profit taken by the broker
in each transaction, since there are no costs associated with making a trade, or monthly fees for managing
accounts as the ones available in other well known markets.

2.1.2 Financial data

Financial data is represented through time series that sample a specific time period at a daily, monthly,
hourly or even smaller sample rate. In comparison to other existent time series, financial time series present
some special properties, given by the underlying structure present in financial markets. This is mainly due to
the huge quantity of different factors that may have influence in final market prices. Figure 2.2 presents a time
series plot of a market closing price.
Time series patterns can be decomposed in four main components: trend, seasonal, cyclic movements
and irregular fluctuations [7]. A trend pattern is a long term increase or decrease present in data. It could be
seen as the directional movement taken by the analyzed series, divided in global -a trend applied to the whole
time series, or local - applied to a sub-sequence of a time series. The seasonal component reflects regular
fluctuations influenced by seasonal factors. Seasonal patterns are noticeable when data faces predictable
changes within a fixed period of time - year, month, day, week or smaller. On the other hand, cyclic movements
represent long term oscillations occurring in time series around a specific trend. Finally, irregular fluctuations
are the random component associated with time series. They are unlikely to be repeated, and are generally
associated to non-predictable events. Financial time series exhibit an extremely high frequency of possible

7
values, which is known as there most recognizable feature, creating high volatility that usually changes over
time. This is due to the influence of external non-systematic factors which lead to irregular fluctuations. By
contrast, systematic factors that influence a financial time series, create the cycle and trend patterns present
in it. Seasonality does not play an important role in this type of series [8].
Having the announced properties in mind, we can describe financial time series as:

• Temporal ordered events: Unlike other series used in cross-sectional studies, financial time series are
natural ordered events, which means that observations could never be shuffled or mixed when conducting
a time series analysis.

• Non-linear: When modelling financial time series, the relationship between the used independent vari-
ables cannot be explained such as a linear combination.

• Non-stationary: Time series data presents different statistics at different times. The mean and variance
shown by a financial time series, constantly changes due to the high frequency of values that prices could
assume.

• Noisy: Every sampled time series is always corrupted with an amount of noise that pollutes the signal.
Financial time series are no exception, and the great volume of random information makes it harder to
predict future price values.

Figure 2.2: time series plot

2.1.3 Technical Analysis

There are two methodologies that traders use to make investment decisions: Technical Analysis (TA) and
Fundamental Analysis (FA). For the purposes of this work, we are going to focus on TA applied to FOREX [9].
FA focus on economical and financial factors that affect a business, such as market conditions, data related
with business management, and economical news, aiming to measure a security’s intrinsic value. On the other
hand, TA is a methodology used to forecast the future price direction of the market by analyzing market data

8
gathered from trading activity. Technical analysts focus on charts of price movement and various analytic tools,
purely relying on statistical metrics to evaluate the health and strength of a given security. In the following
chapter we present TA and consider a set of Technical Indicators (TI), the most important analytic tool used
by technical analysts. Before digging any deeper into how TA and TI should be used, it is very important to
understand the underlying assumptions and principles related with the domain. There are 3 basic principles
[10]:

1. The market discounts everything: This statement assumes that all past, current and future information
is discounted into the markets and reflected in the price of securities. Even though TA ignores funda-
mental factors related to firms, the market still discounts that information considering everything from the
inflation to the emotions of investors.

2. Price moves in trends: : According to TA, the prices always follow some kind of trend. It is believed that
the probability of prices to follow a specific trend is much higher than sudden erratic movements.

3. History tends to repeat itself: TA states that there is a repetitive nature associated with price move-
ments, and that market participants tend to react in a similar way to similar events. This means that
historical data can be used as an important instrument to make a prediction of how the markets will
behave.

These 3 main ideas lay down the foundations for modern TA. They were first developed and introduced in the
Dow theory by Charles Dow [10]. Released in the 1800’s, it was the first investment and trading theory and
was later refined by William Peter Hamilton [11].
TA is used in the form of Technical Indicators (TI). TI are simply mathematical calculations (traditionally in
price or volume) based on the past variations of the market and defined by a formula [12]. Although there is a
great amount of different TI, they can be classified according to their oscillatory behavior as trend following or
momentum oscillators[13]. Next we provide the explanation of the ones used in this work.

2.1.4 Trend following TI

A trend following indicator tries to identify trends in the market. A trend represents a consistent change in
prices, the investors’ expectations [13]. They are extremely useful to identify entry or exit points in a specific
trend cycle. This is done by identifying if the cycle begun or ended in the existent crossovers between the TI
and the market index. We used the following:

Simple Moving Average - SMA

Moving averages (MA) are the most basic tool used in TA. Their main goal is to smooth the trading signal
according to a predefined lag, defined by a number of past periods. They are usually used as noise filters,
because they allow the trader to see a cleaner signal by reducing the number of present oscillations, making
a price average over the number of selected past days. Traders that use MA usually intend to buy when a MA
crosses the price index in a descending way, and to sell when there is a cross in an ascending way. The SMA

9
is simply done by performing the arithmetic mean of the past n time periods. Close(i) represents the price of
an asset on a specific day [12].

n
X Close(i)
SM An = (2.1)
i=1
n

Exponential Moving Average - EMA

It follows the same principle as the SMA, but it gives more importance to latest information. This is per-
formed by assigning more weight (exponentially) to the latest available data.

EM An = Closed × α + EM An−1 × (1 − α)
2 (2.2)
with α=
n+1

In equation 2.2, d refers to the current day, n the number of past time periods and α the smoothing factor [12].

Figure 2.3: SMA, EMA and HMA plot with time frame of 20 days

Hull Moving Average - HMA

The HMA is an extremely fast and smooth moving average that almost eliminates the existing lag at the
same time it improves smoothing along the created average.


HM An = W M A(2 × W M A(data, n/2) − W M A(data, n), n) (2.3)

The equation 2.3, shows the HMA formula. It is made using a Weighted Moving Average (WMA) of the differ-

10

ence between two WMA during the n period, with n representing the number of past time periods and data
the whole time series [12]. We choose to write this formula in a more abstract way, to benefit readability.

In fig 2.3 we present the three above announced moving averages. We choose a time period of twenty
days, making possible to observe the different behaviours assumed by each one of them.

Aroon - AA

The Aroon indicator attempts to identify trending or not trending periods, and how strong the actual trend
is. It quantifies the needed time for the price to reach the highest and lowest points over a set time period,
as a percentage of total time. It is composed by 2 separate indicators, Aroon-Up and Aroon-Down. As initial
parameter, both indicators receive one individual value n, indicating the number of periods since a n-day
high for Aroon-Up, and the number of periods since a n-day low for Aroon-Down [14]. Aroon values oscillate
between 0 and 100. The higher the Aroon-Down, the weaker the downtrend is, and the stronger the uptrend
is. The opposite is verified for Aroon-Up.

n − periods since n high


AroonU p = × 100 (2.4a)
n
n − periods since n down
AroonDown = × 100 (2.4b)
n

2.1.5 Momentum oscillators

A momentum based indicator tries to measure the velocity of directional price movements, in order to
identify the speed/strength of a price change influenced by the enthusiasm of buyers and sellers involved
in the price development [13]. This type of indicators are especially important to predict rapid unexpected
changes on financial assets behavior.

Momentum - MOM

The simplest TI among all oscillators. It measures the absolute difference between today’s close value and
the close value n days ago. Their values express the existing trend. If values assumed by Momentum (MOM)
are positive, then we are in a uptrend, if negative, we are in the presence of a downtrend.

M OMn = closet − closet−n (2.5)

Equation 2.5 is used in each point of the closing price values array. t represents the current day, and n the
number of considered past periods [12].

11
Figure 2.4: MOM plot

Rate of Change - ROC

The Rate of Change (ROC) is a momentum indicator that measures the speed of a change in a value over
a predefined period of time. If values are above 50%, traders should be aware of overbought conditions. If they
are below -50% we are under an oversold period. In terms of calculations, ROC takes the current value of a
stock or index and divides it by the value from an earlier period. The formula is:

Close − Closen
ROCn = × 100 (2.6)
Closen

With n being the number of used past periods, and close corresponding to the available closing prices [12].

Relative Strength Index - RSI

It measures the speed and change of price movements. The RSI is most commonly used on a 14-day time-
frame, oscillating between 0 and 100. Traditionally, and according to Wilder, RSI is considered overbought
when above 70 and oversold when below 30 [12]. The Avg Gain represents the average gain of up periods
during the specified time-frame. Same thing applies to Avg Loss, but using the down periods of the time-frame.
As initial parameter it receives n, the number of past periods to be considered.

100
RSI = 100 − (2.7a)
1 − RS
Avg Gain
RS = (2.7b)
Avg Loss

12
With Avg Gain being given by the sum of gains over the past n periods / n.

Moving Average Convergence Divergence - MACD

The Moving Average Convergence Divergence (MACD) is a tool for identifying entry and exit points in
the market. Although we classified it like a momentum oscillator indicator, it can also be seen as a trend
following TI. It is calculated with a 9 day EMA of the difference between 2 trend following indicators. These
two indicators are the subtraction of a slower EMA on a faster EMA (usually 26 and 12 days, respectively) [12].
This TI generates 3 different signals: the MACD signal, signal line and MACD histogram. Their formulas are:

Figure 2.5: MACD and their 3 signals plot

M ACD(n, m) = EM An − EM Am (2.8)

With n representing the number of periods of the faster EMA, and m the number of periods of the slower EMA.
After calculating the MACD signal, the signal line should be calculated. It is used to trigger buy and sell signals.

Signal = EM A9 (M ACD) (2.9)

Next we calculate the histogram line. It visually represents the sell and buy signal, which can be identified by
a change from positive to negative in MACD histogram signal value.

Histogram = M ACD − Signal (2.10)

When the signal line crosses above the MACD line, a buy signal is activated, and when the signal line is
falling bellow the MACD line a sell signal is sent. MACD histogram indicates trend turning points.

13
Has we can observe in fig 2.5, the histogram area represents trend inversion, which appears when the the
MACD line (the blue signal), touches the signal line (orange line).

Bollinger Bands - BB

Bollinger Bands (BB), is one of the most popular momentum oscillator TI. It is created by combining two
volatility bands with a regular moving average (Upper band, Lower band and Middle band). The volatility bands
are calculated based on the Standard Deviation (STDV), and together create a channel that automatically
expands when volatility increases, and narrows when the opposite happens. The BB formulas are:

U pper band = SM A(n) + 2 × σn (close) (2.11a)

M iddle band = SM A(n) (2.11b)

Lower band = SM A(n) − 2 × σn (close) (2.11c)

With n being the number of used past periods, which generally is set to 20. Once again, close corresponds the
available closing prices. Stocks are considered overbought when the trading signal touches the Upper band,
thus creating a selling opportunity. Oversold periods are identified when the Lower band is touched by the
trading signal, creating a buying opportunity [12].

Figure 2.6: Bollinger Bands plot

Commodity Channel Index - CCI

The Commodity Channel Index (CCI) is a momentum oscillator TI that attempts to identify overbought and

14
oversold conditions. To identify them, as in many momentum oscillators, two thresholds are defined, generally
200 and -200. When CCI is above 200, stocks are considered overbought, and when it is lower than -200
stocks are oversold [12]. Formula:

T P − SM An (T P )
CCI =
0.015 × M D(T P )
(2.12)
Close + High + Low
with TP =
3

In equation 2.12 TP stands for Typical Price and is the average of the 3 sample values. The MD present in the
CCI formula is the Mean Deviation of TP.

Percentage Price Oscillator - PO

The Percentage Price Oscillator (PO) is a momentum oscillator that measures the difference between two
moving averages over the value of the larger moving average. The end result is a signal that tells the trader
where the short-term average is in relation to the longer-term average.

EM An − EM Am
P O(n, m) = (2.13)
EM Am

With n the time period relative to the shorter moving average and m the time period relative to the slower one.

Chande Momentum Oscillator - CMO

The Chande Momentum Oscillator (CMO) is a technical momentum indicator that also attempts to identify
overbought and oversold conditions. The CMO is created by calculating the difference between the sum of all
recent higher closes and the sum of all recent lower closes and then dividing the result by the sum of all price
movement over a given time period. The result is multiplied by 100 to give the -100 to +100 range. The defined
time period is usually 20 periods.
Sun − Sdn
CM On = 100 × (2.14)
Sun + Sdn

In the above equation 2.14 Sun represents the sum of the difference between the current close and previous
close on up days for the specified period n. Sdn is the sum of the absolute value of the difference between the
current close and the previous close on down days for the specified period. Up days are days when the current
close is greater than the previous close. Down days are days when the current close is less than the previous
close.

Average Directional Index - ADX

The Average Directional Index (ADX), is a directional movement indicator, that has as main objective quan-
tifying the strength of a given trend, regardless is type (bullish or bearish). The ADX derives from two other
indicators: Plus Directional Indicator (+DI) and Minus Directional Indicator (-DI). The +DI measures the pres-
ence of an uptrend, while -DI measures the presence of downtrends. Both of them are usually plotted together

15
with the ADX, to get a better interpretation of price movements. Needed calculations are the following:

1. Calculate up and down movements, by comparing the current close price with the last close price.

U pM oves = Hight − Hight−1 (2.15a)

DownM oves = Lowt−1 − Lowt (2.15b)

2. Calculate the Positive Directional Movement (+DM) and Negative Directional Movement (-DM). This two
formulas are based on the last two announced equations. Their job is to filter them according to each
value signal. +DM filters negative values that represent price decline, replacing them by 0. -DM does
exactly the opposite replacing price increase values by 0. Note that in eq.2.15b positive values represent
price decreases.

+DM = M ax(U pM oves, 0) (2.16a)

−DM = M ax(DownM oves, 0) (2.16b)

3. Calculate the Positive Directional Index (+DI) and Negative Directional Index (-DI). This is done by com-
puting an Smoothed Moving Average (SMMA) in respect to n past periods, of +DM and -DM over the
price volatility, expressed by Average True Range (ATR) TI. The value is then multiplied by 100 in order
to be expressed as a percentage.

SM M A(+DM )
+DI = × 100 (2.17a)
AT R(Close, n)
SM M A(−DM )
−DI = × 100 (2.17b)
AT R(Close, n)

4. Finally we can calculate the ADX. To calculate it we calculate an SMMA of the absolute value of +DI
minus -DI, over +DI plus -DI.

 +DI − (−DI) 
ADX = SM M A × 100 (2.18)

+DI + (−DI)

Double Smoothed Stochastic - DSS

The Double Smoothed Stochastic (DSS) can be a helpful momentum based TI for swing traders. It applies
two EMAs of two different periods to a standard Stochastic %K. Usually values of 8 and 13 are used for
calculating the EMAs. It ranges from 0 to 100 and identifies overbought and oversold periods. Formula:

EM An (Close − LowestClose)
DSS = 100 × (2.19)
EM A(HighestClose − LowestClose)

16
Figure 2.7: Double Smoothed Stochastic plot

In fig. 2.19 we also plotted the overbought and oversold thresholds, in order to see where the DSS thinks
that the index is in extreme conditions. As one can see, overbought conditions appear when values are over
80, and oversold when values are below 20.

2.1.6 Volatility

Volatility is a measure of uncertainty and risk. It reflects how prices are currently moving. A high values
of volatility, mean that the range of values assumed by a security could change very dramatically in a small
period of time. On the other hand, low volatility means that prices are stable, and decreasing or increasing at
an acceptable rate.

Average True Range - ATR

The Average True Range (ATR) is usually the TI chosen by traders to quantify the amount of existing
volatility in a specific day. The ATR looks at how far price swings, comparing the highest price values with the
lowest ones.
AT Rt−1 × n − 1 + T Rt
AT Rt = (2.20)
n

Where t represents the current day and n the number of past time periods. T Rt represents the True Range
(TR), which can be caluclated as:

T Rt = max{Hight − Lowt , abs(High − Closet−1 ), abs(Low − Closet−1 )} (2.21)

This calculation retrieves the highest value between the 3 caluculated ranges, assuming that as the true one.

17
We also used the Average True Range Percent (ATRP). It is almost the same as the ATR but results from the
division of ATR data by close values, and converting that to percentage.

AT R(Close, n)
AT RPn = × 100 (2.22)
Close

2.1.7 Other indicators

In this subsection we highlight indicators used as features in this work, that do not fit into the above mention
categories. We also present some indicator that do not fall under the TI family, but since they are used as
features as well, we decided to depict them in this section.

Detrended Price Oscillator - DPO

Detrended Price Oscillator (DPO), is another oscillator TI used in this work. The DPO is an extremely
useful tool, because it removes trend from prices, making easier to identify cyclical price movements present
in a given index. Although being an oscillator, DPO is not considered as a momentum one. This is because this
is not a real time indicator and cycles are identified using a time displacement technique. Cycles are estimated
by counting the periods between peaks or troughs.

DP On = Close − SM A 2+1
n (Close) (2.23)

The displacement effect of this indicator is caused as we can see in equation 2.23, by the subtraction of an
n
SMA of the 2+1 last period, with n being the number of past periods for which the DPO signal is calculated.

Kurtosis - Kurt

Kurtosis is not a financial TI but rather a statistical measure, that indicates the ”tailedness” of the probability
distribution of the provided data around the mean. It indicates the combined weight of a distribution tails
relative to a Gaussian distribution. Data could be heavy-tailed, meaning excess data present in tails, or light-
tailed, which suggests lack of data. Kurtosis is applied to data financial returns instead of market index. Since
financial time series present a non-stationary behaviour, the usage of returns is much more reasonable. The
distribution of them resembles a Gaussian distribution much more closely than absolute market values. The
kurtosis of a Gaussian distribution is 3. Kurtosis is given by the following formula:

n  4
1X xi − x̄
kurtosis = −3 (2.24)
n i=1 σ

In the above equation, n represents the number of samples, xi an individual sample, x̂ the mean, and σ the
standard deviation of the given data. Note that this formulation retrieves the excess kurtosis present in data,
since we subtract 3 to data kurtosis.

Skewness - SKEW

18
Like Kurtosis, Skewness is also a statistical measure that quantifies shape properties of data distribution.
Skewness measures the asymmetry of the distribution when comparing it with a Gaussian distribution. It
retrieves how skewed data is in terms of amount and direction. If skewness is 0, the distribution of data is
perfectly symmetric. A positive skew indicates a longer right tail, while a negative value indicates a longer left
tail. Like kurtosis it is also applied to returns. Its formula is:

n  3
1X xi − x̄
skewness = (2.25)
n i=1 σ

With n representing the number of samples, xi an individual sample, x̂ the mean, and σ the standard deviation
of the given data.

Standard Deviation - STD

Standard Deviation is a statistical measure that indicates how spread are the samples of a dataset, com-
pared to mean of it. It is generally represented by the Greek letter σ. Considering n the number of available
samples, xi a sample and x̄ the mean of all samples, we can write the standard deviation as:
sP
n
xi − x̄
i=1
σ= (2.26)
n−1

Standard Variance - STV

The concept of Standard Variance is analogous to Standard Deviation. The main difference is that values
are squared, which shifts absolute units to square units. It measures the average degree to which each point
differs from the mean. Pn
xi − x̄
i=1
σ= (2.27)
n−1

2.2 Machine Learning

Machine Learning (ML) is a sub-field of Artificial Intelligence (AI), that given a great amount of information
about a specific problem, tries to provide an answer to that problem by describing it in discrete terms, with the
main focus of providing a correct answer or performing the correct action. Contrary to many other subareas
present in the Computer Science field, that apply a logical deductive way of working, ML has a different way of
achieving results, due to its inductive behavior, which could be seen as an empirical way of gaining knowledge
[15]. This is the act of supposing something based on previous observations or assumptions already made
about something. For example, we know that the sun raises up every day so we could inductively suppose that
tomorrow the sun is going to raise up again. This notion of inferring or inducting some of this domain knowledge
into an AI algorithm, makes it gain some sort of understanding about the problem and the environment where
it is inserted. This capacity is achieved by feeding data into algorithms. Data is normally divided by columns,
usually called features, that quantify different dimensions of the problem. For example in the given example,
we could have as daily features the current weather and the current wind. The ML algorithm is going to try to

19
make correlations between the existent values in each one of them, in order to know if tomorrow the sun is
going to raise up or not.
Regarding ML, one could identify 3 types of learning techniques: supervised learning, unsupervised learn-
ing and reinforcement learning. Since this work deals with supervised learning, it is worthwhile to mention the
2 types of problem approaches within it: classification and regression problems. Classification aims to classify
something with respect to a discrete set of values. Alternatively, regression is related with predicting continu-
ous values. Supervised Learning models are algorithms that learn by correct examples. Input must be divided
in two vectors: the model features x, and the output label or target variable y. Model features were already
explained above, and represent the dimensionality of the problem. The target variable y, represents correctly
labelled examples corresponding to a given set of features, in a way that y = f (x), with f being the mapping
function that correctly represents the given data, i.e, the function that correlates the x values with a y value.
When running a supervised learning algorithm 2 different phases are normally performed: the training
phase and the testing phase. This is done with two different datasets, obtained from one original dataset. They
both result from splitting initial data into two smaller chunks, the trainset and the testset. The trainset usually
has a size of 70% to 85%, of the original data, and as the name indicates, is used to train the model. With
it, the algorithm tries to find an approximation of the mapping function f , using the y vector values as a ”gold
standard”. This procedure shows how, the model gains insight about the given data, making it possible to
predict new values of y, according to the found function. This is where the testset comes into use. Generally
with a size between 15% and 30% of the original data, it is used to test the trained model with new never seen
data, outputting a score or prediction of it. The key difference between the train and test lays on the fact that
when test data is fed to the algorithm, the y vector is not present, because it has the desired results. It is later
used when evaluating the model, by comparing it with the achieved results using different evaluation metrics.
In order to create a ML model that correctly classifies new examples, the inference process must be carefully
performed. Too much training, and lack of training could result in two well known problems in this domain:
overfitting and underfitting. Overfitting, occurs when the created model, learns the training data too well. This
means, that all the details and noise present in it, are assumed by the model in such way that the target
function f , is not able to correctly map new elements, which results in a poor performance during the testing
period. Underfitting is exactly the opposite. It occurs when a model is not able to capture the underlying
structure present in the given data, hence, not being able to correctly approximate f . A compromise between
fitting and generalization should be made in order to create a balanced model that is able to capture the
relevant assumptions present in the training data without compromising its performance when new elements
are presented to it.

2.2.1 Artificial Neural Networks

Artificial Neural Networks (ANN) are a powerful information processing model that was designed using
as inspiration the way that human biological nervous system work and deal with information. This type of
architectures come from an AI movement known as conectionism [16], which believes that knowledge is stored
in connections between interconnected processing units, usually known as neurons. Unlike other types of
models available in the ML domain, ANN are also distributed models. Each concept learned by them is
represented by the combination of many neurons, and each individual neuron participates in the representation

20
of many different concepts. Another important factor that sets them apart, is the parallel way of information
processing, in contrast with the sequential approach used by the majority of the prediction models. In a
sequential model, it is possible to use reverse engineering processes to figure out what were the premises
that made the algorithm reach to a specific conclusion. In ANN that procedure is not possible, because each
neuron learns at the same time. That’s why people usually say that ANN act as a ”black box”.

Perceptron Model

To understand how neural networks work, it is important to first introduce the concept of perceptron. The
perceptron model was the first model that tried to replicate the structure of a biological neuron. It was developed
in the 1950s by Frank Rosenblatt [17], and its behavior resembles the logical gates AND and OR, available on
computers.
A perceptron is essentially a gate that receives several inputs and produces a single binary output that
corresponds to a decision considering the received inputs. To generate a decision making mechanism, a set
of weights are attributed to the connections between neurons. The existent number of weights is equal to the
number of inputs received by the network. To activate the neuron (having 1 as output), the number of activated
inputs must be greater than a certain threshold. This procedure is done by a step function, and expresses
how a perceptron can weigh up different kinds of evidence in order to make decisions. Fig 2.8 represents the
preceptron architecture.

Figure 2.8: Perceptron architecture

The final output is computed by applying the step function to the weighted sum of inputs, i.e, the sum of
each input multiplied by its weight. It is given by the formula:

0 Pn
if wj xj 6 threshold

j=1
output = (2.28)
1 Pn
if j=1 wj xj > threshold

With wj being the weight that corresponds to the input xj . Its possible to obtain different results by varying the
values of the weights and threshold. This allows us to create new models, capable to output different decisions.
This may seem an inflexible and simple approach to achieve a decision making model, but if we put together a
large number of connected perceptrons forming a complex layered network - a Multilayer Perceptron Network
(MLP), it is possible to produce incredibly sophisticated decisions.

21
As one can expect, a network created by perceptrons represents a large number of variables that need
to be controlled. The number of inputs, weights and thresholds could grow exponentially and therefore the
formula 2.29 should be simplified. MLPs store their parameters as matrices. Therefore, we could represent
Pn
the weighted sum j=1 wj xj as a dot product and rewrite it as w · x, where w and x are vectors whose
components are the weights and inputs, respectively. Another change is to modify the threshold by moving
them to the opposite side of the expression, and substitute them by what is know as bias, b ≡ −threshold [18].
We could interpret b as being a measure of how prone is the perceptron to ”fire” or not. The whole formula
could be rewritten as: 
0 if w·x+b60

output = (2.29)
1 if w·x+b>0

Although things seem more consistent, there is still a huge downside in the perceptron architecture: the
reaction of the step function concerning value changes in the network tunable parameters. For example a
small change in weights or biases could easily flip the neuron output from 0 to 1, thus activating it. This
could potentially make the network assume a ”cascade” behaviour where a small change in a neuron changes
the output of other attached neurons. That’s why we introduce Feedforward Neural Networks in the next
subsection, which overcome this problem using smoother activation functions that facing small changes in
input parameters, produce small changes in output values.

Feedforward Neural Network - FNN

As it was introduced in the last part of subsection 2.2.1, a group of interconnected perceptrons forms a
network. A Feedforward Neural Network (FNN), is simply a group of many layered fully connected neurons,
where connections between them do not form any type of cycle, i.e, information flows only in one direction,
from the input nodes to the output nodes. This means that there is no presence of any recursive connections
or loops linking neurons and making information go backwards, like in Recurrent Neural Networks (RNN).

FNN are part of a subclass of algorithms in ML known as Deep Learning (DL), which concerns about
algorithms that try to reproduce the structure of the brain. They use ”cascade” multiple fully connected layers
besides the traditional input and output layer, responsible for non-linear processing, transformation, and feature
extraction. They usually have built in non-linear functions. This is needed for two reasons: first the majority of
problems can’t be explained by a linear function. If that happens, there is no need for DL and ML conventional
methods could be used. The second is that linear combinations of linear functions result in a linear function.
Fig 2.9 showcases FNN architecture.

The usage of non-linear functions in the middle layers leads us to one of the finest advantages of using
standard FNN (and other types of NN), they can approximate any given non-linear function. This means
that in theory, NNs can solve every existent problem, since the majority of problems can be translated into
mathematical terms by the means of a function. This was proved by Hornik et al in 1989 with the universal
approximation theorem [19], showing that for any continuous function f on a compact set K, there exists a
feedforward neural network, having only a single hidden layer, which uniformly approximates f to within an
arbitrary ε > 0 on K.

22
Figure 2.9: Feedforward Neural Network architecture

As other algorithms present in ML, FNN can be used for several different tasks, namely supervised learning
and unsupervised learning. Supervised Learning tasks can be divided in regression and classification. This
defines the type of activation present in each layer and neuron of the network (different layers could present
neurons with different functions). Input neurons just let information pass and usually do not have any type of
activation function. Hidden and output neurons use more refined functions for neural activation. Below we
present a list of the most common activation functions. Have in mind that z = w · x + b, as explained in eq. 2.29

• Linear: This represents a straight line function, where the computed activation is proportional to the
input. The output produced by a function of this type, will lay in the domain where the input values belong
to. For example if the input is a continuous value, then the output y of this activation would respect the
condition y ∈ R. This is useful when one deals with regression problems and tries to predict a continuous
value. In this type of problems, linear activations must be applied to the final layer of network.

σ(z) = z (2.30)

• Sigmoid: The sigmoid or logistic function is a non-linear function that resembles the behaviour of the
perceptron step function in a more smoothed way. Like this one, it is bounded between 0 and 1, but
instead of having a step, with values above a certain x range abruptly changing their y value from 0 to 1,
it forms a ”S” shaped curve that allows the output to belong to the interval between 0 and 1, as shown in
fig 2.10. Therefore, small changes in weights or biases result in small changes in the neuron’s output. It
is applied to the hidden layer neurons or to final layer of the network if we are in the presence of a binary
classification problem.
1
σ(z) = (2.31)
1 + e−z

23
Figure 2.10: Sigmoid function

• Softmax: The softmax function is a generalization of the sigmoid function. While the sigmoid function
can only handle the prediction of 2 classes, the softmax function is used when dealing with multi-class
classification problems. Like the sigmoid function, it also squashes the output to be between 0 and 1, and
divides each output such that the total sum of the outputs is equal to 1. This transforms the output into
class probabilities, calculating the probabilities distribution over K different classes. It should be used in
the last layer of the network.
e−zj
σ(z) = Pk (2.32)
−zk
k=1 e

• Hyperbolic Tangent: The hyperbolic tangent or tahn is another non-linear function applied to the hidden
neurons present in the hidden layer. It is similar to the sigmoid, but the output is bounded between -1
and 1. It is also a zero centred function (the sigmoid is not), which could be beneficial during the learning
phase of the network [20].
2
σ(z) = tanh(z) = −1 (2.33)
1 + e−2z

• Rectified Linear Unit: Rectified Linear Units usually known as ReLUs, are the simplest non-linear ac-
tivation functions that could be used. The output of them is simply 0, if the input value is less than 0,
and the raw input otherwise. A good thing about using ReLUs is the fact that training a network is much
faster. When using tahns or sigmoids in the hidden neurons, almost every neuron present in the network
is likely to be activated when processing an input, i.e, almost every activation is used to describe an
output. The activation could become really dense. With ReLU only the most important neurons remain
activated, making activations more sparse and effective.

σ(z) = max(z, 0) (2.34)

24
Backpropagation

Backpropagation is a method for training ANN, frequently


described as neural nets learning mechanism. Developed by Werbos in 1974 [21], it is used for optimizing the
final network output, by adjusting its weights. The optimization is performed during the training phase, through
the usage of Gradient Descent [22], or other variants of this mathematical optimization algorithm.
When training an ANN with the backpropagation algorithm, one can identify two major procedures: the
forward pass and the backward pass. The first one is simply the forward passage of the input values from the
input layer to the output layer, including all the transformations performed on the hidden layers of the network.
When arriving to the last layer, a cost function is computed between the outputted values and the target vector
that includes correctly labelled data. This cost function is going to express the difference between the achieved
results y, and the true values ŷ, and is going to be minimized during backpropagation. Different cost functions
are used for different problems. The most used ones are:

1. Mean Squared Error: The mean squared error (MSE), is the most commonly loss function used when
dealing with regression problems. It simply calculates the average of the squares of the deviation errors,
this is, the difference between the true values, and what was estimated. Squaring removes any negative
signs, and also gives more weight to larger differences.

n
1X
M SE = (yi − yˆi )2 (2.35)
n i=1

With n being equal to the number of predictions, yi the true value or estimator, and yˆi the predicted value.

2. Categorical Cross Entropy: The categorical cross entropy is a function used in classification problems.
It can be used for binary and multi-class problems.

n
1X
Loss = − yi log yˆi (2.36)
n i=1

With n being equal to the number of predictions, yi the true value or estimator, and yˆi the predicted value.

Note that each one of this functions, and the other functions used in activations as well, share an important
property among them: they are all differentiable functions. This means that its derivative exists for all existent
points. This a key factor for the backward passage . After computing prediction errors using the cost function,
network weights must be update in order to improve the outputted results, ultimately converging the system to
the smallest possible error. This is done by using Gradient Descent, an algorithm that tries to find the optimal
parameters that allow the network to minimize the cost function to a global minima. To update parameters, a
constant ∆W should be added to each weight. For each individual weight ∆W is given by:

∂Etotal
∆Wj = −η × (2.37)
∂Wj

This calculates the partial derivative of Etotal , the total error given by the cost function with respect to Wj
the existent network weights, i.e, how much a change in Wj will affects the total error Etotal . This is why every

25
system function should be differentiable. Calculating ∆Wj takes advantage of a well know calculus formula,
the chain rule. The chain rule says that if a function f is differentiable at f (x), and another function g is
differentiable at g(f (x)), then the derivative of the composite function f ◦ g is also differentiable at the point x
[18]. This is given by:

(f ◦ g)0 (x) = f 0 (g(x)) · g 0 (x)

if y = f (y), and y = g(x), then we can write: (2.38)

∂z ∂z ∂y
= ·
∂x ∂y ∂x

The η is called learning rate, and it is a parameter that defines the speed of learning. It should be carefully
defined: a large learning rate could make the gradient give large steps towards the optimal solution, raising the
possibility of skipping an optimal minima. On the other hand, a too small learning rate could make the system
too slow in terms of convergence.
This algorithm updates its parameters for a specified number of epochs. An epoch equals a forward pass
plus a backward pass. When the final epoch is reached, the final weights are updated and stored, thus creating
a final model, fitted in relation to the given data.

2.2.2 Genetic Algorithms

The infinite variety of living species on the Planet Earth is a result of one of the most sophisticated and
unique mechanisms that one could experience: natural selection. Evolutionary Computation (EC), is a family
of algorithms inspired by this process, which can be described as empirical problem solvers, that use a set
of stochastic and metaheuristic optimization tools, in order to achieve an optimal or near-optimal problem
solution. Genetic Algorithms (GA) are a powerful and extremely versatile optimization tool, that are part of this
broader family. As part of it, they try to mimic the evolution process observed in natural evolution of species,
being often considered an evolutionary parameter tuning algorithm. They follow the methodology ”survival
of the fittest” introduced by Charles Darwin in evolutionary theory [23]. This implicitly means the appliance
of that principle in the breeding process of new generations, composed by different individuals. Individuals
are represented by chromosomes. Therefore, each generation is represented by a group of chromosomes,
and each chromosome represents a point in the existing search space, interpreted like a possible problem
solution. From a biological point of view, a chromosome, is a DNA molecule composed by a set of different
genes. We can translate this simple analogy into the algorithmic spectrum by qualifying a chromosome as an
array structure where each element, a gene, is a weight that represents a model parameter. To apply Darwin’s
methodology, a fitness function is applied to each one of the created individuals. The fitness function tell us how
is the performance of an individual, with respect to a set of different parameters (mapped as genes), that try to
explain a problem domain. After its calculation, a set of operations present in Evolutionary theory are applied,
in order to form a new generation of individuals, based on the previous generation individuals characteristics.
In this process some individuals are rejected based on the performance calculated by the fitness function. Only
the elements with the highest values of performance will survive, and those will be the ones used to breed new
chromosomes. By the end, the fittest element is the one chosen as the optimized individual. The introduction

26
of this biological processes accelerates the algorithmic convergence, making GAs viable alternatives to other
more exhaustive and potentially expensive parameter tuning methods like Grid Search and Random Search.
The algorithm can be decomposed in the following steps:

1. Population initialization - At the beginning of the algorithm, chromosomes are randomly generated in
order to achieve diversity. If one wants to obtain the best fitted optimized values of a given population,
then having a diverse set of individuals is a key factor for finding an optimal problem solution. A generally
used, and well accepted rule, is to initialize the algorithm with a population size approximately equal to
10 times the dimensionality, i.e, the number of genes [24]. Samples have a fixed length of genes. Each
gene is coded with values in a given interval, that usually is binary, but could represent a different range of
values. This decision is problem dependent, but one should have in mind that larger ranges lead to larger
search spaces. A too large search space could be beneficial in terms of finding an optimal solution,
but it could also be too much time and resource consuming. A binary, 4 dimensional chromosome is
represented in fig. ??.

Figure 2.11: Binary 4 gene chromosome representation

2. Fitness function calculation - This function main goal is to evaluate the solution domain. It takes an
individual candidate as an input and outputs how fit it is as a solution of a given problem. Different fitness
functions could be used, since this decision is strictly domain dependent. For example imagine an opti-
mization problem concerning a classifier that aims to classify its input data into 3 different categories. To
achieve maximum performance in this task, one could select as fitness function the well know classifica-
tion metric, accuracy. This way, the GA is going to evaluate each individual by calculating the number of
correct predictions in terms of percentage. In this case the GA will face a maximization problem, and the
fittest individual will be the one with the highest value of accuracy.

3. Selection - After performing an evaluation of every solution available in a given population, is time to
apply the ”survival of the fittest” principle. As previously explained, only the fittest solutions ranked with
higher fit values, are going to be selected to proceed and take part in the breeding process of new
chromosomes generation. There are several ways of performing this decision process [25]:

• Roulette Wheel Selection - The Toulette Wheel Selection (RWS), is method that is part of a set of
methods where each individual can turn into a parent chromosome with a probability proportional to
its fitness function value. This probability is given by:

fi
pi = Pn (2.39)
i=1 fi

27
With fi being the fitness function of a population individual. A roulette wheel is then formed with
each individual probability, and a random selection (resembling rotating a wheel), is performed. Note
that individuals of higher probabilities will have a larger ”pie” of the wheel and obviously higher prob-
ability of being chosen as candidates for the next generation breeding process.

• Stochastic Universal Sampling - The stochastic Universal Sampling (SUS) is similar to RWS,
since each individual also has a probability proportional to its fitness. The main difference between
this two methods, is that instead of ”spinning the roulette wheel” k times, to obtain k parents, all
parents are chosen in just one spin of the wheel. This tries to give weaker members a chance of
being used to form the next generations.

• Tournament Selection - Tournament Selection (TOS), is one of the most used methods in litera-
ture. TOS implements a tournament mechanism among population individuals, where only the ones
with highest fitness remain in the end. For a given population of individuals, n tournaments are per-
formed. Each tournament is done by selecting a k number of individuals, by random sampling. The
elements in each tournament with higher fitness values, are chosen for being parents in the next
generation. Small values of tournament size n, will preserve diversity among the population while
large values will give smaller chances to weak individuals. Fig. 2.12 presents the above explained
behaviour.

Figure 2.12: Tournament selection method

• Rank Selection - Rank Selection is a method that attempts to prevent early convergence. It is
mostly used when individuals in the population have similar fitness values. This method instead of
using fitness values to perform a selection it creates a rank, constructed from the fitness values. For
example, consider 4 individuals with fitness values 14, 5, 36, 75, respectively. To create ranks for
each individual the fitness values should be ordered in ascending order: 1st: 5, 2nd: 14, 3rd: 36 and
4th: 75. The sum of all ranks is then computed being equal to 10 (1+2+3+4). Individual probabilities
are now calculated dividing each rank by the sum of all ranks.

28
• Random Selection - Simply selecting parent chromosomes by random sampling. This improves
diversity but slows down the total convergence time.

4. Crossover - In order to generate new individuals, a crossover methodology is used. This process selects
two or more parent chromosomes (selected via selection process), and creates new individuals based on
a combination of each parent genes. To perform the crossover, a splitting point must be chose. The two
most common methodologies are one point crossover and multi-point crossover. In the first one, each
parent chromosome is divided at the same random generated crossover point, and the created tails are
combined to form a new individual. Multi-point crossover is the generalization of the previous method,
but with n splitting points. Fig 2.13 represents this operation.

Figure 2.13: Crossover operation

5. Mutation - Mutation, as crossover, is also a genetic operator commonly performed after crossover takes
place. As the name indicates, the mutation operation is simply the process of mutate one or more gene
values that exist in the chromosome vector structure. It is used to introduce and preserve diversity
in the population. Individuals selected are selected for mutation based on a low value probability pm .
This probability should be low otherwise GA will resemble a random search mechanism. Mutate values
are sampled from a distribution function selected according to gene’s value domain. For example if we
are using a chromosome that has only binary genes, a gene mutation will correspond to a bit flipping
from 0 to 1 or vice-versa. If each gene assumes integer values in a range between two values, then a
mutation value could be sampled from a Uniform distribution, defining the lower and upper bounds of the
distribution.

6. Fitness evaluation This part of the algorithm is extremely important, because it is where it is determined
when a GA run will end. The fitness value calculated by the fitness function, now attached to each
received chromosome, is again checked to evaluate the fitness obtained value of every candidate. Until
reaching the end of the termination criteria, an initially predefined number of generations, the algorithm

29
keeps taking two step backwards, by performing the crossover and mutation operations again, therefore
initializing the cycle of a new generation breeding. The whole cycle of the GA is depicted on fig 2.14.

Figure 2.14: Genetic algorithm flowchart

2.3 Related Work

This section is intended to address and present some related solutions to the work that was developed.
The section will be divided in 2 subsections. The first one is directly related with works where NN take a major
part in the problem solution, in financial contexts. The second subsection reports works performed with GAs.
In the end it will be provided a resume table, where the most relevant solutions can be compared.

2.3.1 Related works on Neural Networks

Over the last 20 years, NN have been a largely used model when it comes to financial related works. They
prove to be solid models, capable to extract and correlate different existent features in order to achieve useful
and reliable results. FNN are one of the most used models, although other NN types have been used for
forecasting and modelling of financial markets.
Yao and Tan [26] proposed a neural network model that received as input a set of different technical in-
dicators along with weekly time series data, to capture the underlying “rules” of the movement in currency
exchange rates. They tested it with 5 major currencies against the US dollar (USD), namely the Japanese Yen
(JPY), Deutsch Mark (DEM), British Pound (GBP), Swiss Franc (CHF) and Australian Dollar (AUD). They struc-
tured the problem as a regression one, forecasting the weekly market closing price for each market. Evaluation
was performed using the Normalized Mean Squared Error (NMSE), weekly return and directional change, ex-
pressed as gradient. Regarding this metrics, the authors acknowledged that the aim of market forecasting is
achieving the highest possible value of trading profits, and it does not matter whether the forecasts are accurate
or not in terms of NMSE or gradient. They are useful measures to access the overall model quality, but should
be carefully interpreted, because they do not consider the existent market trend. The experimented different
topologies of the NN model with TI as features for the above mention markets, achieving as their best result a
return of 28.49% for the CHF market. They ultimately report that the obtained results imply that all the studied
markets are not random walk and are not highly efficient, making the forecasting task realistically possible.

30
Kara et al.[27], also confirmed the supremacy of NN models in financial applications, in relation to classical
ML approaches in the Istanbul Stock Exchange(ISE) National 100 Index. They deployed a comparison between
two classification techniques, NN and Support Vector Machines (SVM). They also used TA as input feature
generator, selecting ten TI as inputs of the proposed models. Instead of predicting future market values,
they decided to forecast direction of daily change in the stock price index, creating labels that identify stock
movements. Labels are categorized as 0 or 1. If the ISE National 100 Index at time t is higher than that at time
t − 1, direction t is 1. If the ISE National 100 Index at time t is lower than that at time t − 1, direction t is 0. To
evaluate the deployed models they used daily data from 1997 to 2007. They conclude that both the ANN and
SVM models showed significant performance in predicting the direction of stock price movement. However,
performance of the ANN model (75.74%) was found significantly better than that of the SVM model (71.52%).
Jigar Patel et al.[28] predicted CNX Nifty and S&P Bombay Stock Exchange (BSE) Sensex indices from
Indian stock markets with a fusion of different machine learning techniques. A two stage fusion approach
supporting ten different technical indicators as input features was proposed, and applied to three different
models: a SVM combined with an Artificial Neural Network (SVR-ANN), a twofold SVR model (SVR-SVR) and
finally a SVR combined with a Random Forest model (SVR-RF). The first stage of each one of these models
applied a SVR model to each inputed feature in order to predict the next day value of that feature. The second
stage consisted in a ANN, SVR or RF model, fed with ten future values of the previously predicted statistical
parameters. A comparison between a single stage model and a two stage model of each one of the deployed
solutions was performed. The results proved that a combination of two techniques could achieve impressive
results, totally overcoming a single layer methodology. The authors justify this with the fact that in a two layered
procedure, prediction models in the second stage have to identify transformation from technical parameters
describing (t + n)th day to (t + n)th day’s closing price, while in a single stage approach, prediction models
have to identify transformation from technical parameters describing tth day to (t + n)th day’s closing price.
The best overall prediction performance was achieved by the SVR–ANN model.
A common approach to financial market analysis, is also the forecasting of returns. Qio et al. [29] applied
this methodology to the Japanese Nikkei 225 index. They collected 71 variables that included financial indi-
cators and macroeconomic data, divided in a monthly fashion, with a covering period from November 1993
to July 2013. A feature selection algorithm called fuzzy surfaces was used to reduce data dimensionality, to
a minimal combination of explanatory variables. They discovered that from the initially collected 71 features
only 18 statiscally significant. Finally, data was fed into a three layer ANN with a regular back propagation
mechanism, with a GA for parameter optimization. Performance was accessed using the Mean Squared Error
(MSE). Results shown a MSE value of 0.0017 for the best model, and an average MSE of 0.1219, obtained
from 900 training experiments.
There are also other types of NN that achieved promising results in the financial domain. The Long Short-
Term Neural Networks (LSTM) are one them. They are state-of-the-art RNN model, widely used in the Natural
Language Processing (NLP) domain for sequence learning, whose abilities are gradually being explored in
market applications. A good example of its appliance was designed by Fischer and Krauss [30]. They used
a LSTM network in order to predict next day daily return, transformed as binary labels. To convert continuous
return values into discrete labels, they created a strategy based on the cross-sectional median return of all
stocks in period t + 1 (the same period of the target variable). If the return value was smaller than the median
value, a label 0 would be given, otherwise it would be 1. As input features they used 240 past return values

31
(aproximatelly one trading year), i.e, every t − n return with n from 0 to 239. They compared an LSTM
architecture with Random Forests (RAF), FNN and Logistic Regression (LOG). LSTM, achieved the best result
among the all used strategies, with an accuracy of 54.3%, and mean return of 46% (excluding transaction
costs).

2.3.2 Related works on Genetic Algorithms

As it was previously stated in subsection 2.2.2, a Genetic Algorithm (GA), is an evolutionary computing
technique designed to search an optimal or near optimal solution in a search space were the algorithm is
confined, always following a methodology that tries to mimic and approximately replicate the principles of
genetic and natural selection.
A good example of how parameter optimization could be performed, was presented by Chih-Hung Wu
et. al. [31]. This work aimed to develop a genetic-based SVM (GA-SVM) model that could automatically
determine the optimal parameters, C and σ, of SVM with the highest predictive accuracy and generalization
ability simultaneously. The model was built to predict bankruptcy, and was tested on the prediction of financial
crisis in the Taiwan market, achieving results that empirically confirm that the GA-SVM model performs the best
predictive accuracy when compared with the other tested models, namely classic financial statistic predictive
methods and a FNN. To optimize the initial parameters of the SVM, the GA-SVM first generates a random
population, where real values of C and σ are coded into the chromosome structure of each element of every
generation. Finally, the model searches for optimal values iteratively, applying a survival of the fittest strategy.
A more similar work with the one proposed in this thesis was done by Sezer et al. [32]. They built a
deep MLP neural network for buy-sell-hold predictions, with TI parameters optimized by GA. As input data,
they used daily stock prices of Dow 30 stocks between 1/1/1997 to 12/31/2006, for training purposes, and
stock prices between 1/1/2007 to 1/1/2017 for testing. The target vector of buy-sell-hold points was created
based on values given a RSI trading strategy in combination with a trading strategy based on SMA to identify
uptrend and downtrend market periods. The GA was used to find best RSI values for buy and sell points
in downtrend and uptrend in a random initialized population of 50 individuals. Generated chromosomes are
divided in two distinct parts: 4 initial genes for identified uptrend periods and 4 genes for downtrend periods.
Each gene codifies different parameters. The first one randomly creates RSI Buy values between 5 and 40.
This correspond to RSI oversold periods, were according to RSI trading strategies, are plausible periods for
market investments. Regarding this value, RSI Buy intervals are created randomly between 5 and 20, in the
second gene. The third and forth genes are equal to these. The only difference is that they are related to the sell
periods, making RSI Sell signal (in the third gene), assume values between 60 and 95, which according to RSI
trading strategies are considered overbought market periods. The most profitable chromosome is retrieved,
and training data is generated according to it. The achieved results proved that using this strategy, the created
system had the capability to beat the classical Buy and Hold trading strategy in 16 out of 29 Dow 30 stocks.
Yusuf and Asif Perwej [33] also proposed a system that combined GA with a FNN, optimized for the BSE
market. They used a GA to search a space of ANN topologies and select those that matched optimally their
criteria. The deployed a network consisted of one input layer, two hidden layers and one last output layer. The
searched topologies included the number of neurons of the input and hidden layers, since the last layer was
always confined to one neuron, because the defined output was the prediction of tomorrow’s excess return.

32
They compared their results with classical Time Series prediction methods namely Autoregressive models,
and concluded that ANN models are superior, due to being able to capture not only linear but also non linear
patterns in the underlying data. They also discovered that their ANN performance is influenced by the way
that weights are initialized, ultimately concluding that this step should be performed in terms of the mean and
standard deviation of several randomly selected initializations.
Gorgulho, Neves and Horta [34] proposed a work that used a GA kernel to optimize technical analysis
rules for stock picking, and portfolio composition. Their work aimed to manage a financial portfolio by using
technical analysis indicators as trading rules. The used TIs were EMA, HMA, ROC, RSI, MACD, TSI and OBV.
The Dow Jones Industrial Average Index (DJI), was used as the selected market, giving to the system user
the possibility of choosing data frequency: daily, weekly or monthly. For each trading rule, a score is assigned
according to a specific set of hard-coded rules, different from TI to TI. Regarding this mechanism, 4 scores
were assigned. A very low score indicates a strong sell/short signal, and a value of 1 is given to it. An equal
score is attributed to a very high score, indicating a strong buy signal. To low and high scores a value of 0.5
is attributed. A low signal indicates an under-performed signal with, potential to sell or to go short, while a
high signal indicates a reasonable buy signal. In this work the GA is used to optimize classified trading rules.
After the optimization performed by the algorithm, resulting on a classifier equation, where a set of technical
indicators are correctly balanced, all the assets within the market are classified with weights. In order to validate
the developed solution an evaluation was performed comparing the designed strategy against the market itself
and several other investment methodologies, such as Buy and Hold and a purely random strategy. The testing
period from 2003 to 2009 allowed the performance evaluation under distinct market conditions, culminating
with 2008-2009 financial crash. The results are promising since the number of positions with positive return
exceeds 80%, for the GA, confirming the high confidence level of the proposed approach.
Regarding trading rule optimization, Hirabayashi et al. [35] proposed GA system to automatically generate
trading rules based on TI, that instead of trying to predict future trading prices focuses on calculating the most
appropriate trade timing. The training data used in this work consists of historical rates of the U.S. Dollar
towards the Japanese Yen (USD/JPY) and Euro towards the Yen (EUR/JPY). For each data set, they used
hourly closing prices. As selected TIs they used RSI, EWMA, Percent Difference from Moving Averages (PD)
and the rising rate from one hour ago of the original data (RR). These indicators are used to generate trading
rules. The GA is going to try to optimize the system searching for the most profitable individual according
to 3 conditional equations involving the mentioned TIs, establishing boundaries for each TI, that are going to
determine if we are going to have a buy or a sell order. System results are compared to a Buy & Hold strategy
and with a Neural Network strategy. They claim that their work surpassed this strategies, achieving an average
of 17% of profit for the EUR market and 80% for the USD market.

33
Work Date Application Heuristic Main goal Input Variables Data period Performance
[35] 2009 FX market EUR, GA Optimization of Technical indica- 2005 - 2008 - Approximately 15% ROI for
AUD and USD trading rules tors Hourly window AUD, 50% ROI for EUR, and
against JPY 40% ROI for USD
[26] 2000 FX market AUD, ANN Forecasting FX Time-series data 1993:11 - 1995:07 - 29.45% ROI and 0.043 NMSE
CHF, DEM, GBP market - Regres- and Techinical Weekly window for the USD/CHF market
and JPY against sion indicators
USD
[34] 2011 DJI 30 stocks GA Technical in- Technical indica- Not specified ROI 25,29%
dicators rules tors
optimization
[27] 2011 Istanbul Stock Ex- SVM and Market trend binary Technical Indica- 1997:01 - 2007:12 75,74% ACC for ANN and
change (ISE) Na- ANN prediction tors 71,24% ACC for SVM
tional 100 Index
[29] 2016 Japanese Nikkei ANN, Fuzzy Forecasting stock Fundamental Indi- 1993:11 - 2007:07 - 0.0043

34
225 index Surfaces market - Regres- cators Monthly window
and GA sion
[28] 2015 BSE Sensex and SVR-ANN, Forecasting stock Time-series data 2003:01 - 2012:12 - 139.39 MAE, 3.41 RMSE for
CNX Nifty SVR-RF, market - Regres- and Techinical Adjustable window CNX Nifty and 449.75 MAE and
SVR-SVR sion indicators 3.34 RMSE for BSE Sensex
[31] 2007 Curated list of in- GA-SVM Bankruptcy classi- Bankruptcy data Montlhy window 76% Accuracy
dustries fication
[32] 2017 DJI 30 stocks GA-ANN Buy-sell-hold Technical indica- 1997:1 - 2007:1 - 22.4% ROI for the MLP+GA ap-
points prediction tors Daily window proach.
[33] 2012 Bombay Stock Ex- GA-ANN Daily return predic- Fundamental and Daily window Mean excess returns - 1.026%
change (BSE) tion technical indicators
[30] 2018 S&P 500 volatility RAF, ANN, Prediction of direc- Fundamental and 1992 - 2015 - Daily 0.46 Daily return
index LOGIT and tional movements technical indicators window
LSTM
Chapter 3

Implementation

3.1 Model overview

In this section, it is intended to provide a general overview over the solution that was developed during the
development phase. The overall approach to market prediction is decomposed into small sets, were each part
has a distinct role in the deployed model, using different methodologies and technologies, useful for achieving
the desired result. The presented model results from the combination of concepts that were introduced in the
previous sections, namely, FNN and GA. All the code was written using Python3 [36], due to the great support,
accessibility, speed and available packages related with this work. We could summarize the developed system
workflow in the following steps:

1. User input: The user starts by providing initial configuration data to the system, specifically FOREX
market data, model parameters, GA settings, and TA desired features.

2. Feature calculation: After specifying the initial parameters and configurations, the system calculates
the desired TA features with the provided data.

3. Optimization: In this module the GA algorithm is used to find the best optimized version of the FNN
model. According to the characteristics of each generated individual, data is fetched from produced
features data. Models are evaluated, and the best one is selected according to a selected fitness function.

4. Model prediction: After selecting the best optimized model, the testset will be used to make a prediction.

5. Market simulation: The outputted prediction is used to create a market strategy. The market strategy is
evaluated using financial metrics.

The complete sequence of processing could be seen in fig 3.1, where each different layer is depicted for
further explanation in the following sections.

35
Figure 3.1: System workflow

3.2 User input

The user input module is responsible for the interaction between the user and the system. We created
a Python configuration file, where several system parameters could be configured, enabling the creation of
different system architectures. The user can configure the following parameters, divided in 5 different sections:

• Data: In the configuration file, the final path to the market file where market records are available, must
be specified in the path parameter.

• TI features: It is possible to select which TI are going to be used by enabling them in the configuration
file. All the usable indicators have a parameter corresponding to its acronym. To define them, the user
must simply set as TRUE the desired ones. We also specified two additional parameters: upper bound
and lower bound, which set the maximum and minimum time periods for which each indicator is going to
be created.

• GA: The GA parameters configure the behavior of the evolutionary operators, and the number of created
chromosomes. The initial population is controlled by the pop parameter, and should receive an integer
corresponding to the number of desired individuals. Same thing applies to the ngen parameter, which
controls the number of generations that are going to be used as GA termination criteria. The remain-
ing parameters control the select, mutation, and crossover operations. Regarding select, the tournsize
parameter, configures the number of selected individuals for Tournament Selection (section 2). For mu-
tation and crossover, mutp and cxpb were created, representing the probability of applying the mutation
and crossover to a selected individual. Since mutp and cxpb are probabilities, they should be defined by
a float in the range from 0 to 1. Finally we have a parameter that controls the used fitness function. The
fitness func parameter, receives a string corresponding to the name of the fitness function to be used
(section 3.4.3).

• Neural Net: This section of the configuration file concerns about parameters that control the behaviour
and architecture of the neural network. Activation, receives a string corresponding to the activation
function used by the hidden layers of the network. Epochs, receives an integer number corresponding
to the number of epochs selected for training the network. Finally, we have batch size, that indicates the
number of samples that are going to be propagated through the network in each forward and backward
passage, and the network optimizer, a string representing the optimizer used in the backpropagation
process. Still regarding the network internal architecture, we also deployed the Batch Norm parameter.

36
This parameter controls the usage of a Batch Normalization process on the second layer of the network,
by being set to TRUE or FALSE. Details about this procedure are explained in section 3.5, while in
chapter 4 its introduction or not in the model is discussed throughout the experimental analysis conducted
in different FOREX markets. There are also two more relevant parameters that care about internal data
usage, which greatly influence the system performance. Train size specifies the percentage of the dataset
that is going to be used for training the network. Since it is given as a percentage, the system is able to
automatically infer the size of the data used for test. Validation size is used to indicate the portion of the
training used to validate the system performance.

• Investment: Investment parameters reflect the how the system is going to invest in the Market Simulator
module. 3 parameters were created. Initial capital indicates how much money the user has to invest
while asset number, indicates the number of assets to be acquired upon transaction. Finally, transaction
cost simulates the usual costs involved in sell/buy transactions between brokers and traders.

3.2.1 Data

The system is prepared to receive Time Series market data that respects some feature constraints imposed
by the created solution. The path specified in the config file should point to a .csv file with periodic information
from a selected FOREX market. It should comprise the following fields:

• Date: The date correspondent to each available record. This system accepts different periods of data,
but it is recommend smaller frequencies like ticks, hourly or minute to achieve more interesting results.
Daily data is not recommend since the majority of the available FOREX datasets, usually have as starting
period 1999, which will correspond to a small amount of records, insufficient for extracting reliable insights
out of the system.

• Open: The open price rate at the beginning of the day.

• Close: The close price rate at the given time period.

• High The highest achieved rate price at the given time period.

• Low The lowest achieved rate at the given time period.

Note that the provided file columns must respect the above mentioned order. Data will be stored in the
dataframe format provided by the Python library Pandas [37]. Pandas is an extremely popular Python package
for Data Science, that offers powerful tools to manipulate, analyze and store data. A dataframe is a flexible
way to store data in a two dimensional data structure that is aligned in a grid fashion, composed by rows and
columns. Rows correspond to the existent data records, while columns include the variables collected at a
specific data record (in this case at a specific date period). Pandas dataframes can also be effortlessly manip-
ulated due to the large set of available operations. They work in such a way that access to information does
not require going through iterative loops, making splitting and transformation operations flexible and efficient.
Fig 3.2 shows an example of a FOREX EUR/USD market data stored in Pandas dataframe, in a raw stage, i.e,
without any additional financial features.

37
Figure 3.2: Raw input data

3.3 Feature calculation

This is the module that adds and calculates TA features using the initial given data. As we previously
specified in section 2.1.3, this corresponds to the addition of a set of different TI that use past information to
generate new signals. This operations will be performed using PyTi [38], a Python library that contains various
financial TI that can be used to analyze data.
Throughout the system workflow, feature data will be accessed several times. With that in mind, we chose
to calculate each feature only at the beginning of the program execution avoiding repeated calculations. Calcu-
lated features will be stored in a folder that will contain an individual .csv file for each feature. The lower bound
and upper bound parameters initially selected in the configuration file, are going to set the number of columns
of the created file, with each column corresponding to the TI calculation, accounting with a different number of
past periods. The selected range should be wide enough to compress small, medium and large time periods,
but also not excessively large, in order to avoid the creation of a too big search space, which would greatly
increase the system optimization task.
To facilitate the TI creation, we also divided the used indicators in 2 different types: normal TI and special
TI. Normal TIs are features that take only 1 parameter, hence the above explained methodology is applied.
Special TIs take more than one parameter, with values also contained in the same used range. The creation
of a .csv file for those indicators would be too much time consuming, inevitably creating big data structures
that would require great amounts of memory. We opted to calculate them on the fly, since the system uses
a smaller number of special TIs when compared to normal TIs. Regarding the used TIs (section 2.1.3), we
separated them in the following order:

• Normal TI = [EMA, SMA, RSI, MOM, ATR, ATRP, BB, ADX, AA, CMO, DPO, DEMA, ROC, DSS, ROC,
KURT, SKEW, STD, STV ]

38
• Special TI = [CCI, MACD, PO]

The system has the ability to automatically separate the 2 types of TI. Some of the indicated TIs condense
more than one indicator, but for ease of interpretation we choose to represent them with only one symbol. For
example the AA indicator is divided in 2 separate indicators, the AA up and AA down. These correspond
to the 2 bands that are used to create the AA indicator, and their calculation rely on 2 different formulas.
Therefore, they are processed by the system as 2 two normal TIs, with different input parameters. Figure 3.3
shows an example of the csv created for the SMA indicator.

Figure 3.3: SMA csv

3.4 Optimization

The optimization layer is the core layer of the created system. In it, the previously provided market data,
goes through several different procedures in order to create an individual that has the best possible perfor-
mance according to the defined fitness function. As it was explained in section 2.2.2, individuals are repre-
sented by array structures called chromosomes. In this layer, the generated chromosomes are used for two
main purposes. First, to join input data with calculated features, second, to create individual FNNs were the
grouped data is going to be inputted.

Several individuals in the form of FNNs are then going to be created, with each one of them going through
the evolutionary process computed by the GA. Predictions made by each FNN are going to be evaluated by
the fitness function, and the process is going to be repeated until a stopping condition is met. By the end, the
best individual is returned. Fig 3.5 depicts the overall procedure that is used in the optimization layer.

39
Figure 3.4: Optimization layer

3.4.1 Population generation

The first step to be performed in the optimization layer is the creation of a population of individuals. The
size of each generated individual varies according to the number of initially selected TI features, corresponding
to a chromosome with genes with values from lower bound to upper bound (section 3.3). Besides using genes
as the number of past TI periods, each chromosome also encodes two more types of genes, namely presence
and neural network genes. Both have different purposes and represent the 3 main tasks performed by the GA:

• TI creation: The first main functionality was already explained, and is TI creation, which is achieved
through the usage of TI genes as feature parameter. Gene values codify the number of desired past
periods, and are randomly selected by the GA, when an individual is initialized.

• Feature selection: Feature selection is a key mechanism in ML solutions. The main idea is to select
only a relevant set of feature, promoting the model generalization capabilities without compromising its
correctness, thus reducing the existent model variance. Therefore, unnecessary, irrelevant, and insignif-
icant attributes that do not contribute to the accuracy of the predictive model, are removed. The system
attempts to achieve this goal by using the previously mentioned presence genes available in each chro-
mosome. For each one of the of the TI parameter genes, there is a correspondent presence gene that
indicates if the respective feature will be used in the model creation or not. Each presence gene also
codifies a value between lower bound and upper bound, but the main difference is that the system inter-
prets it as a presence binary flag. This is done by using the statistical median of the data population. If
the codified value is higher than the median, the feature will be used in the model creation, otherwise it
will be discarded, thus reducing the number of used attributes.

• Neuroevolution: Neuroevolution is a term applied to models that use evolutionary capabilities, such
as GAs, to evolve through time in order to achieve optimized models. In this system, this is done by
encoding 2 parameters that correspond to the number of neurons present in the input and hidden layer of

40
the network. Hence, during the initialization process, individuals will assume different FNN architectures,
with different network topologies. For deploying this mechanism, two additional neural network genes
were deployed, which also codify values between lower bound and upper bound, giving to each layer a
value in that range.

A possible chromosome structure of the generated individuals is depicted in fig 3.5.

Figure 3.5: Chromosome structure

As explained in section 2.2.2, the number of created individuals is going to correspond to the value of the
population size. New individuals are going to be successively created throughout the GA running time. They
will be created according to the number of generations initially set, with each generation being based in the
previous generation fittest elements.

3.4.2 Model creation

After initializing a population of individual chromosomes, the system is ready to create a dedicated FNN for
each individual considering the GA selected parameters. This is the part were data is grouped, and the model
topology is selected according to each gene value. Regarding data grouping, there are some procedures
needed to be performed in order to correctly use the existent data - data preprocessing. By the end, the
model outputs a prediction vector, which corresponds to a trading strategy based on the approximated function
discovered by the model. Later on, this prediction is going to be used to access individual quality of each
individual.

X matrix creation

The first step of the model creation is the establishment of a feature matrix, generally called X matrix,
where each x input (or model feature), is present. As explained in section 3.3, only special TI features will be
calculated, with the remaining ones being selected from the initially created feature csv files. Thus, generated
features will be appended with the initially provided data (which contains the Date, Open, Close and High
features), to create a new dataset with a higher dimensionality. Features will be appended or not according to
the values present in each presence gene. Regarding the train test split explained in section 2.2, the created
matrix will be split in Xtrain , and Xtest , with the splitting point being initially set by the system user. The Xtrain
vector is further decomposed in order to generate the Xval vector, which corresponds to a percentage of the

41
trainset (selected by the user in the config file), used to evaluate the model performance, during optimization
period. The Xtest is preserved to be later used with the optimized model.

Y vector creation

Since in this thesis we structured the prediction problem as a Supervised Learning one, the creation of
a vector that has correctly labelled predictions is needed. This is what is present in the y vector, with each
position corresponding to the market rate return at time t. Signal variations are achieved by calculating the
return, i.e, comparing the closing price in time t with the closing price at t − 1, being characterized has pos-
itive or negative. Since financial data exhibits a non-stationary behavior, this procedure is crucial to ensure
that data used in train and test set lays on the same joint distribution, which is guaranteed by the Gaussian
nature of financial returns [39]. To create discrete labels instead of continuous values given by this calculation,
a binarization process by threshold is performed. If the calculated return is positive, a label of 1 is created,
otherwise, a 0 is given, meaning that the return at time t is negative. By deploying this methodology, the model
attempts to predict future market returns instead of forecasting future rates. We find this approach much more
useful, since one can’t expect that the created model is able to accurately forecast a continuous value that
correctly expresses market rates. From the trader perspective, is much more profitable and beneficial to only
know when is a good time to invest in the market, or when one should stay out of it. Equation 3.1 represents
the binarization process used to create the y vector.

0 Closet −Closet−1
if 60

Closet−1
yt = (3.1)
1 Closet −Closet−1
if >0

Closet−1

It is important to notice that we are attempting to forecast the next day market variation, and not the present
market variation at time t. Therefore, for each set of features at time t, the constructed label corresponds to
equation 3.1 for t + 1. This is performed by using the above formula for each row of the dataset, and after it,
shifting the created y vector one row above. Similarly to what is done in the creation of the X matrix, a splitting
from train, validation and test is also done on the created y vector, giving arise to ytrain , yval , and ytest .

Data cleaning

The stage of data cleaning is needed in order to eliminate non desirable values from data.
By performing all the several computations needed for feature calculation, undefined or unrepresentable
values emerge in each data column. This is due to the formulation used to calculate TIs which consider
the past periods as initial parameter. For example lets consider a SMA that uses the n past hours as initial
parameter. This will necessarily make that the first n − 1 values present in the SMA feature column, to become
undefined, being treated as not-a-number values (NaN). Since the SMA TI, calculates a simple average over
the n past periods, periods with t inferior to n can not be used, thus being populated with NaNs. A similar
problem arises when the y vector is being created. By shifting the y vector one row above, making each
row at time t to correspond to the next period t + 1 market return, the last value of the y vector will be also
treated as NaN. These are values that can not be processed by the neural network, thus, the removal of them

42
is compulsory. The chose procedure, was to drop rows were NaN were encountered. This is an acceptable
approach when dealing with data sampled from financial markets, since other well known practices like value
interpolation, replacement by mean and replacement by median, will introduce bias into system data.

Feature Normalization

To improve the overall model convergence, feature scaling is a needed measure. When data includes a high
number of dimensions, values assumed by each individual feature could present a different range of values,
i.e, different levels of variance. This is undesired during the FNN training phase. Although it is possible for NNs
to naturally adapt to such heterogeneous data, the existence of such dispersed features makes training much
more difficult and time consuming. To overcome this problem, a normalization method called standardization
(or z-score), was applied. This procedure is individually applied to every dataset feature, making each one of
them to have zero mean and unit variance, following the properties of a normal distribution. This helps the
convergence of the learning process, more specifically the Gradient Descent algorithm, explained in section
2.2.1. If features in considerably different scales, are used during the application of the Gradient Descent, gra-
dients will not take a direct path towards the global minimum due to the shape assumed by the cost function,
which could result in slow training procedure or even make the system to be stuck at a local minimum [40].
Considering µi as the mean value of feature i, and σi as the standard deviation of feature i, we could formulate
the standardization formula as:

xt − µi
xt = (3.2)
σi

Where xt represents each value present in the feature vector i. An important note about the standardization
process is the way how it is performed. In order to correctly use this normalization procedure, the above
mentioned formula needs to be first applied only to each value present in the Xtrain matrix. We exclude the test
set to prevent a well known problem called look-ahead bias. This is a type of model bias, that emerges when
assumptions regarding the testset are used in the training period, ultimately conducting to a high performance
model, were the achieved results are not trustworthy. This would be the case if one tries to standardize the
whole dataset, with bias being leaked into the trainset in the form of µi and σi . To get around this issue, values
of µi and σi should be drawn only from the trainset.

FNN creation

Now that all the needed information is preprocessed and ready to be utilized, its time to create each indi-
vidual FNN, according to each initialized individual. The model creation was created using the Python Deep
Learning library Keras [41], which contains all the tools needed to easily built any type of neural net. Data is
fed into a 3 layer structured model that forecasts market returns, as a binary vector of predictions ŷ. The model
is trained using the Xtrain and ytrain matrices, and tested with Xval and yval . The validation accuracy will be
the optimized metric during the backpropagation process. In each epoch run, the FNN is going to train and
test the network, continuously calculating the train and validation accuracy. The run with the highest value of

43
validation accuracy is going to be saved as the optimized neural network. In it, the Xtest vector will be used to
evaluate the behaviour of the network on new never seen information. In section 3.5, we explain in detail how
the FNN model is created during each system stage.

3.4.3 Fitness computation

The fitness function is the evaluation mechanism deployed in the GA. It is going to take as input a candidate
solution (a vector of predictions ŷ), and outputs how fit is the provided solution in relation to the function. After
the FNN creation, for each created population chromosome, fitness is going to be accessed. Therefore, due to
the high number of repetitions that are going to be done, the fitness function should be implemented rigorously,
in order to not slow down the entire system. There is no strict rule in terms of selection of fitness function.
It could simply be the FNN model cost function, a classification metric, financial metric or any other custom
one. Therefore, one could choose to directly enhance the model performance, optimizing a metric used during
the learning process of the network, or indirectly improve the model capabilities with an outside metric, which
is not being internally calculated by the model but contributes to the model overall performance. We propose
2 different fitness functions, 1 directly related with the system performance, and other related with market
profitability:

• Accuracy: Accuracy is a metric that is calculated throughout the model learning process. It gives the per-
centage of correct predictions by comparing the correctly labelled vector yval or ytest , with the prediction
vector ŷ. It directly impacts the model performance.

N
1 X yi − yˆi
accuracy = (1 −
) × 100 (3.3)
N i=1 yi

With N being the total number of samples to be predicted, yi the individual values of the y vector, and yˆi
the individual values of the ŷ vector.

• Return of investment: This is a financial metric that measures the efficiency of the performed invest-
ments, giving a result that could be translated into a measure of the achieved gains in terms of percent-
age. Contrary to the last presented formula, the ROI is a metric that is not directly applied into the model
calculation, and it is used after the model prediction, which means that its optimization will affect the FNN
in an indirect way.
Returns − InitialInvestment
ROI = × 100 (3.4)
InitialInvestment

3.4.4 GA operators

This section covers all the blocks depicted in fig.3.5 that are used by the GA, as evolutionary operators:
stop condition, selection, mutation and crossover 2. We took advantage of the DEAP library [42], that en-
ables the individual creation of each evolutionary operator, offering a great set of possible configurations. The
implementation regarding each one of them was the following:

• Stop condition: To terminate the ongoing evolutionary optimization process, a certain criteria should be
met. We simply selected the number of generations as the GA stopping criteria. Until reaching the final

44
number of generations, the GA continues the breeding process created by the developed evolutionary
operators. If the condition is met, then the GA proceeds and outputs the best individual that existed
during the running period.

• Selection: To perform the selection phase, we selected as operator the Tournament Selection method
(section 2.2.2). The process is repeated a number of times equal to the size of the initially created
population.

• Mutation: The mutation operator is responsible for mutating one or more genes according to a certain
mutation probability. Since each created individual encapsulates different types of genes, presence and
TI parameter genes, a simple mutation process would not be adequate for this problem. Therefore,
we developed a custom mutation that cuts the selected individual in half, and performs the mutation
operation in the first half of the chromosome genes, i.e, where the TI parameter genes are included. We
do this because we do not want to change the available presence genes, present in the second half of
the chromosome. The mutation process is done by sampling an integer uniformly drawn between a lower
and upper bound equal to the ones defined for the TI parameters.

• Crossover: The crossover operator is used to combine different individuals by mixing each chromosome
genes. Similarly to what happens with the mutation operator, the crossover must be redesigned in order
to avoid a possible blend between presence and TI parameter genes. We created a custom crossover that
receives two individuals, divides each one of them at half, and applies a multi-point crossover methodol-
ogy in each split. The final result is given by 2 new individuals with switched genes between them, picked
according to 2 randomly selected crossover points in each half.

3.5 Model prediction

After the termination of the GA procedure, we achieve an optimized set of parameters, given by an optimal
individual, represented by the fittest chromosome according to the selected fitness function. Since this individ-
ual could be present anywhere in the GA search space, it was not feasible to store each individual generated
data. Hence, a final prediction is made using the parameters of the optimized version of the system. This is
what this module aims to. This final prediction is performed by the optimized FNN, i.e, the fittest individual
among the created population. The prediction itself, derives from the appliance of the Xtest and ytest (which
had been both held until this procedure), instead of Xval and yval as it was performed during the optimization
process. The architecture of the developed system is composed by 3 individual layers. The first layer, the input
layer, is heavily shaped by the chromosome genes. It will be selected by the penultimate chromosome gene,
and it cannot be inferior to the number of columns present in the provided data, in this case, the number of
picked features by the GA, plus the number of initial data features. If the GA selects a number inferior to it, this
value is going to be selected as the number of neurons in the first layer. Activations will be performed by relu.
The technical details of this function are presented in section 2.2.1.
The second layer is technically similar to the first one, with the exception of the number units. The number
of neurons will be selected according to value of the last neural network gene. Before passing the output
from the first layer to the second one, a Batch Normalization process could be applied to each mini-batch,

45
with batch size and Batch Normalization appliance both being selected and initialized in the configuration file.
Batch Normalization is a process that aims to recreate the initial data normalization process, explained in
section 3.4.2, to the hidden layers of the neural network. The general idea is that if standardizing the initial
inputs of the network improves the overall model convergence, then applying the same principle to the middle
layers of the created model will be also helpful. In fact, Batch normalization speeds up the training process and
minimizes the data distribution changing across layers by forcing the mean and variance by standardization.
The main idea is to reduce the internal Covariance Shift, i.e, the change in the distribution of network activations
across different network layers [43].
Finally the network, is also composed by a final third layer. Since the prediction target are binarized returns,
the number of neurons in the last layer is unchangeable, and is permanently set to 1. The activation function
in the last layer neuron must account with that, and the predictions of the network must be forced to 0 or 1.
Considering the binary classification task that will be performed by the FNN, a sigmoid function (section 2.2.1)
was selected. As loss function, the binary cross entropy was choose, which is no more than a special case
of the categorical cross entropy introduced in section 2.2.1. Regarding the network optimization procedure
explained in section 2.2.1, the optimizer field in the configuration file is going to determine the selected opti-
mization algorithm to use in backpropagation. Keras provides a set of different optimizers such as Stochastic
Gradient Descent and other improved versions of it. Still concerning the network data fitting procedure, it is
worth to notice, that Keras forces each epoch to use the train and validation sets. Therefore, in each passage,
the network is fully trained with the train set, and fully tested with the validation set, making it possible to check
in real time, how fitted to the received data the network is.

Figure 3.6: Final prediction pipeline

Besides the architecture above specified, we also introduce two other mechanisms that attempt to con-
tribute to the improvement of model convergence. The first one is the model checkpointer, which saves the
model after every epoch only if the monitored metrics, loss or accuracy, have improved since the last epoch.
This is done by storing the network weights in a hdf5 file, which is later used to load the network again, and
output a prediction. The other used mechanism is the early stopping. Early stopping greatly improves the
computation time, since it aims to stop the learning process whenever there is no improvement in the last x

46
epochs, with x being refereed as the patience threshold. The combination of this two tools, makes the model
being able to simultaneously filter non-desirable results, and at the same time return the best seen element.

3.6 Market simulation

The market simulation module is a module designed to test market trading strategies in a simulated envi-
ronment. It receives the prediction made by the optimized model and tests it against the market for what it was
designed. We defined 2 main market positions, long and short. When the simulator orders a long position,
the system invests in the market and purchase a predefined number of stocks. Going short means exactly the
opposite, with the system selling first and buying when the position is closed (section 2.1). The applied strategy
will be defined by the binary returns that are present in the predictions array. When the next period return, t + 1,
is positive, we have at time t a label with value 1, which indicates proper investing conditions. When t + 1 is
negative, the system labels value t as 0, meaning that the next day is not suitable for market investments. To
create the final market strategy, a new signal derived from the model predictions vector must be generated.
In order to simulate the transition from long to short investing, a third label, -1, is then create. Therefore, the
system investment procedure condenses 3 different states. 1 represents a long position, -1 represents a short
position and 0 stands for a non-investing behaviour. Before investing 2 parameters must be set: the initial
capital and the number of assets to be bought or sold when a long or short position is requested, respectively
defined as initial capital and asset number in the configuration file. Their values should be established and
varied according to the specified market. To showcase all the calculations and system behaviour during the
market simulation, a csv with 9 columns is created, with each column being:

• Date - The date of the respective market period.

• Close - The close price on a given date.

• Prediction - The signal generated by the optimized FNN.

• Signal - The generated market strategy. Represents the difference between the period t + 1 and period
t of the prediction column. This generates the 3 above mention labels: 1 - long, -1 - short and 0 hold.

• Positions - Number of positions to be open according to the trading signal. This number is pre-defined
in the configuration file, and when an investment is performed the number of transacted assets its always
equal to it, with its signal varying according to the signal column value. If 1, positions is going to be a
positive value, and if -1 a negative value. When 0, the value remains equal to the previous values, since
no transactions are performed.

• Positions value - It indicates the value of all the the opened or in debt market positions. Its calculation
is created by multiplying the closing market value by the number of acquired assets, i.e, the values in the
Close column by the Positions column.

• Cash - Indicates the total amount of money owned by the user, since the begin of the market simulation
until the end of it. This quantity is going to vary according to market investments.

• Total - The total number of assets owned by the user. This combines the money that user has at a
specific time period, plus the combined value of open market positions.

47
• Returns - The market return given a time period t. Useful to identify if the system correctly purchased/sell
on that period.

Upon csv creation, the system is going to use the calculated values to access the overall performance of
market investments. Financial metrics presented in section 4.2, were used for that purpose. Fig. 3.7 illustrates
the mechanism scrutinized throughout this chapter.

Figure 3.7: Market simulation

The above calculations account with the usual transaction costs charged by the broker. This cost can be
relative or fixed. Relative fees change according to the market volume size, and fixed fees are as the name
indicates, fixed, with its value remaining equal regardless of the size and volume of the trade being placed.
In FOREX markets the commission fee is normally included in the market spread (section 4.1). Since in
this application we do not use bid and ask prices, making the spread calculation is impossible (the available
data does not include this two fields since they change according to the chose broker), hence, we decided to
use an average fixed commission cost of 0,0001% per transaction, in order to cover all the commission plus
spread trading expenses.

48
Chapter 4

Results

4.1 FOREX Data

To evaluate the performance and robustness of the previously defined model, we tested 5 different FOREX
currency pairs market data, each one described by date, open, close, high and low. As selected sample rate,
we chose to use hourly data, since available FOREX data only had as starting period the year of 1999. This
would intrinsically mean a small amount of usable information, and deep learning algorithms such as FNNs,
work better when fed with higher data quantities. All the tested FOREX datasets have the same row size,
compressing hourly data over the period of 12/03/2013 to 12/03/2018. which represents a 5 year period of
31167 trading hours. Experiments were performed in the following markets, with indexes depicted in figures
4.1,4.2, 4.3, 4.4 and 4.5:

• EUR/USD: The most popular currency pair among FOREX traders. It compares the 2 largest economies
in the world.

• GBP/USD: One of the oldest currency pairs, that puts the British Pound (GBP) against the US Dollar
(USD).

• GBP/JPY: This currency pair compares the British Pound (GBP) against the Japanese Yen (JPY), and is
known for its great volatility. It is considered as a cross currency pair (section 2.1.1), since the US Dollar
is not used to calculate the exchange rate.

• USD/JPY: Compares the US Dollar against the Japanese Yen. This market presents low interest rates,
which consistely make it one of the most popular among traders.

• USD/CHF: Another major currency pair. It compares the US Dollar with the Swiss Franc (CHF), and it
is considered as a safe market due to its behavior during times of uncertainty, usually staying stable or
suffering some appreciation.

49
Figure 4.1: EUR/USD market index Figure 4.2: GBP/USD market index

Figure 4.3: GBP/JPY market index Figure 4.4: USD/JPY market index

Figure 4.5: USD/CHF market index

50
A descriptive summary of each market dataset is presented in table 4.1.

Table 4.1: Summary of market indices

Markets Start date End date Samples Mean Max Min Std Avg. candle

EUR/USD 12/03/2013 12/03/2018 31167 1.197 1.396 036 0.109 0.0016

GBP/USD 12/03/2013 12/03/2018 31167 470 717 202 0.143 0.002

GBP/JPY 12/03/2013 12/03/2018 31167 16101 195.846 125.069 17.694 0.3127

USD/JPY 12/03/2013 12/03/2018 31167 109.743 125.683 92.739 8.019 0.0017

USD/CHF 12/03/2013 12/03/2018 31167 0.955 033 0.840 0.038 0.0014

We can confirm that the highest values of standard deviation are given by the markets that include the
Japanese Yen as currency pair. Such values indicate evidence of high volatility in this type of markets, confirm-
ing the riskier profile that often describes them. We can further check this information by observing the range
created between the Min and Max columns for each market. However, a high volatility market does not imply
high volatility during hourly periods. This is showed by the last table column Avg. candle, where an average of
the hourly market rate variation, is computed for each tested market, by subtracting the High and Low columns
for each dataset record. We also present a summary of market returns for each one of the selected markets.
We present this because the model attempts to learn and predict binarized returns instead of the actual market
index (section 3.4.2). Table 4.2 shows a summary of the analyzed market returns for each tested market.

Table 4.2: Summary of market returns

Markets Start date End date Samples Mean Max Min Std

EUR/USD 12/03/2013 12/03/2018 31166 −2.081 × 10−6 572 × 10−2 −2.035 × 10−2 114 × 10−3

GBP/USD 12/03/2013 12/03/2018 31166 −1.557 × 10−6 2.176 × 10−2 −5.69 × 10−2 1.159 × 10−3

GBP/JPY 12/03/2013 12/03/2018 31166 2.086 × 10−6 3.247 × 10−2 −8.468 × 10−2 1.566 × 10−3

USD/JPY 12/03/2013 12/03/2018 31166 3.847 × 10−6 1.442 × 10−2 −2.968 × 10−2 1.234 × 10−3

USD/CHF 12/03/2013 12/03/2018 31166 1.022 × 10−6 2.604 × 10−2 −1.402 × 10−1 1.438 × 10−3

4.1.1 Data statistics

Since we chose financial returns as the go for methodology to transform raw market rates, we can compute
helpful statistics that provide us useful information about the profitability and risk associated to each selected
market. Kurtosis and Skewness (section ??) are 2 powerful statistical metrics that provide information about
the shape of the given data distribuition comparing it to a Gaussian distribution. This is possible considering
that the probability distribuition of market returns is approximately normally distributed. Table 4.3 presents the
Skewness and Kurtosis values of each market testset.

51
Table 4.3: Distribution shape descriptors

Markets Kurtosis Skewness

EUR/USD 22.415 1.038

GBP/USD 19.990 -0.288

GBP/JPY 12.841 -0.114

USD/JPY 9.033 -0.348

USD/CHF 9.917 -0.216

Positive kurtosis indicates excess kurtosis in market data. Since all the measured return kurtosis is above
0. (note that the used formulation in section ?? discounts 3) we prove the presence of heavy tails in the
distribution of events. This indicates a higher likelihood of extreme losses and gains. The Skewness parameter
represent how skewed data is. When combined with Kurtosis, it indicates which type of outliers is more likely
to be found (negative values indicate a left skew, and positive values a right skew). Since Skewness measures
present values close to 0. we can conclude that data distribution is almost symmetrical, and the likelihood of
achieving higher returns and losses is almost identical.

4.2 Evaluation metrics

In order to evaluate and access quality of a created ML model, one must choose adequate evaluation met-
rics, capable to correctly express the model performance. Evaluation metrics vary according to type of ML
problem that is being proposed. We could divide the used evaluation metrics in two different types: classifica-
tion metrics and financial metrics.

4.2.1 Classification metrics

Classification metrics are formulas that express the performance of ML models for classification tasks.
Since in this thesis the prediction problem is structured as a classification one, it is important to access the
effectiveness of the created system under some classical classification criteria, in order to check how good is
the model predictive power, and how well it performs the desired task. We chose the following metrics:

• Accuracy: Accuracy is by far the most used metric in classification tasks. It simply measures how fre-
quently a created classifier makes correct predictions. It is given by the proportion between the number
of correct predictions and the total number of predictions, therefore referring to the bias of the predictions

tp + tn
Accuracy = (4.1)
tp + tn + f p + f n

52
Note that the numerator of the above equation refers to the number of correct predictions made by
the model, decomposed in two main components. The first tp refers to the number of correctly identified
positive individuals (for example in a binary classification, how many individuals with label 1 were correctly
identified), and tn the number correctly identified negative individuals (how many individuals with label 0
were correctly identified). Regarding the denominator , f p represents the number of incorrectly identified
individuals and f n the number of individual incorrectly rejected.

• Precision: Precision is a metric that measures how good was a classifier in the task of classifying
positive elements. We could say that precision answers to the question, ”How many selected items are
relevant?”, being interpreted as the ratio of positive elements which were correctly identified.

tp
P recision = (4.2)
tp + f p

• Recall: As precision, recall is a metric that complements accuracy and gives deeper insight about a
classifier’s performance. Recall answers to the question ”How many relevant items are selected?”, i.e,
among all the existent positive items, how many were retrieved by the system.

tp
Recall = (4.3)
tp + f n

4.2.2 Financial metrics

Besides assessing the predictive capacity of the system with classification metrics, it is equally important to
interpret the achieved results through traditional trading measurements. Since we are dealing with markets and
more specifically with investments in the FOREX domain, the usage of performance measures that evaluate
the efficiency of the built model in terms of gain and losses, is a naturally logical procedure to assess the overall
quality of the model. We used the following:

• Return of Investment: Measures the efficiency of the performed investments, giving a result that could
be translated into a measure of the achieved gains in terms of percentage.

Returns − InitialInvestment
ROI = (4.4)
InitialInvestment

• Maximum Drawdown: This metric aims to produce a very close approximation to calculate the asso-
ciated risk of investment, measuring the peak-to-tough decline during a specific period of investment.
Drawdowns are calculated as the difference between the highest local maximum and the highest local
minimum. The period of Drawdown recording is performed till the occurrence of a new local maximum.

T roughV alue–P eakV alue


M DD = (4.5)
P eakV alue

53
4.3 Experimental setup

Before presenting any results related to each market, it is necessary to configure the developed system.
Since there are many configurable parameters (section 3.2) , we decided to create a standard setup that is
going to serve as baseline configuration for the set of experiments to be performed. We selected the following
model setup:

Table 4.4: System parameters

Parameter Value Component

TI upper bound 100 TI features

TI lower bound 5 TI features

Tournsize 3 GA

Initial population 200 GA

Number of generations 20 GA

Crossover probability 0.5 GA

Mutation probability 0.2 GA

Train size 0.8 FNN

Validation size 0.2 FNN

Activation function ReLU FNN

Epochs 100 FNN

Batch Size 32 FNN

Optimizer Adam FNN

Transaction costs 0.0001 Trading module

Number of assets 80 000 Trading module

As its possible to check in table 4.5, 4 components of the system were tuned.

• For TI upper and lower bound we selected a range from 5 to 100. in order to have indicators that work
with small, medium, and long time periods. Also, with this specification, we are expanding the GA search
space, extending the number of possible combinations, used during the feature selection process. Also
it is worth mention that as used input features, we selected all the TIs announced in section ??.

• In terms of GA we tuned 5 parameters. We selected a tournsize value of 3. This enables the Tournament
Selection mechanism (section 2.2.2) to compete with 3 individuals. The GA initializes itself with a initial
population of 200 individuals. We choose this value according to R. Storn [24]. He states that the number
of initially created individuals should be 10 times the dimensionality of the problem, and since on average
each individual accounts with 20 features, we thought that starting with 200 chromosomes would be a
fitted value for this problem. The number of generations was set to 20 due to the lack of computing power.

54
Greater values could achieve better results, but that would greatly increase the computing time, which
is already expensive when working with 20 generations (approximately 32 hours). Concerning the other
two GA operators, Crossover and Mutation, we selected a probability of triggering of 0.5 and 0.2. which
are standard values for both operators.

• With respect to FNN related settings, we set up 6 parameters. The first two, Train size and Validation size
simply establish a value that represents the percentage of data used for its creation. Note that the value
of Train size, corresponds to a percentage in respect to the whole dataset, while Validation size indicates
the percentage of train set used for validation. For neural activations we selected the ReLU function due
to the advantages that it brings in terms of algorithmic convergence [44]. The number epochs decided
for each individual is 100. since the implemented early stopping mechanism 3.5, with a patience level
of 20 epochs, rarely lets the propagation procedure to reach the 100 epochs. Batch size is set to 32 for
2 main reasons. First, 32 is a standardized value usually used, and second lower values of batch size
improve the overall convergence of the network [45]. Finally the optimizer, who establish the algorithm
used in the backpropagation procedure, is Adam, an extension of Stochastic Gradient Descent that has
proven to be effective among a large variety of domains paper [46]. We also have an extra parameter
that controls the introduction of a Batch Normalization layer in the network, but since its inclusion will be
changed during the experimental process, we decided to not include it here.

• As training parameters we defined the transaction costs and the number of assets to 0.0001% and 80
000. The initial capital will be market dependent, due to the differences in currency strength, presented by
the selected markets. Another parameter that it is not specified in this setup, is the used fitness function.
We did not defined that here, because it will be changed during the experimental procedure, in order to
evaluate the system behaviour when changing the fitness function.

4.4 Case study A - Simple prediction

In this first case study, we intend to showcase the system behaviour without performing any type of opti-
mization to enhance the model performance. The idea is to get the most simple, pure predictions from the
created FNN, using as input vector a GA individual created from a single run. Parameters are going to be set
according to the experimental setup 4.3. The evaluation methodology is going to be based on the previously
announced metrics (section 4.2). Predictions are going to be performed on the datasets presented in section
4.1. The used input features are going to be all the ones announced in section ??. Note that not all of them
are going to be selected, since feature selection is always performed by the GA.

4.4.1 Classification results

The first way to access the performance of a simple prediction of a single individual, is to evaluate the
system according to traditional classification metrics. We present the validation and test accuracy because it
is extremely important to keep track of how different is their performance. Test precision and recall, give us
an idea of how well the system is hitting or missing the ground truth results. For the 5 different markets we
obtained:

55
Table 4.5: Classification results

Market Train ACC Validation ACC Test ACC Test Precision Test Recall

EUR/USD 54.56% 50.34% 50.99% 56.82% 50.50%

GBP/USD 52.46% 50.76% 50.41% 15.12% 50.46%

GBP/JPY 53.36% 50.37% 50.87% 32.14% 50.63%

USD/JPY 54.39% 52.16% 50.37% 47.13% 49.91%

USD/CHF 54.58% 51.01% 50.08% 35.03% 50.57%

The above obtained results reflect an average of each monitored metric, throughout 10 different system
runs in each one of the presented markets. We showcase accuracy for 3 different sets. First, for the train
set, we present the final accuracy shown by the prediction model when giving the train set has test. This
displays how well the FNN model learned the given train data. Both Train ACC and Validation ACC represent
the final values of the FNN in the final neural epoch. Note that the implemented backpropagation algorithm,
backed up by the early stopping mechanism, is going to the decide when the network reached the final epoch
by searching for the best ever seen accuracy in validation. The performance of the validation set (a cluster
of held-out never seen data used as a test set during the optimization period), reflects how well the system
behaved in terms of predicting next hour market variations. Since the backpropagation process optimizes the
initially selected fitness function on the validation set, we also showcase Test ACC, a metric that represents
accuracy measurements in new data, the test set.

Table 4.6: Classification results with Batch Normalization

Market Train ACC Validation ACC Test ACC Test Precision Test Recall

EUR/USD 58.33% 51.85% 51.03% 60.74% 50.51%

GBP/USD 59.65% 52.5% 51.61% 48.55% 50.84%

GBP/JPY 62.23% 51.02% 50.64% 2.69% 35.33%

USD/JPY 61.15% 51.46% 51.05% 31.26% 50.89%

USD/CHF 59.48% 51.06% 50.77% 49.15% 51.08%

For the sake of model comparison and system optimization, we also studied the impacts of having a Batch
Normalization layer in the created FNN. This technique is known for speeding up the training procedure, en-
hancing the FNN learning capabilities 3.5. By normalizing the inputs, we assure that activations distribution
remains Gaussian, by forcing it to have zero mean and unit variance. By linearly transforming the given data,
we increase the training speed, improving the overall convergence of the network. With the selected parame-
ters, each batch with a size of 32 data records is going to be normalized between the first and the second layer
of the network. The hypothesis here was to understand if higher train accuracies, could ultimately lead to im-
proved results in terms of test accuracy and ROI. Applying this methodology we obtained the results presented

56
in table 4.6. As it is possibly to notice, the introduction of a Batch Normalization layer significantly changed the
results for each market. Comparing the obtained outcomes, the following observations can be made:

• As it was expected, the appliance of a Batch Normalization layer improves every market results in terms
of training accuracy . This empirically proves that Batch Normalization accelerates the training procedure
of the FNN model, making the network to converge faster and learn more from the given data. We were
able to achieve an average improvement of 11.27%, which is an extremely positive indicator in favour of
Batch Normalization usage.

• Validation accuracy also improved with Batch Normalization . This is a promising achievement for the
rest of this work, since in this test case we are only dealing with one single individual (no optimization is
performed on the validation set). We obtained an average improvement of 1.28%, which at least indicates
some enhancement of the prediction capabilities of the network. It is also important to notice that this
metric was not improved in some tested markets. For the USD/JPY market, the inclusion of Batch
normalization reduced the performance from 52.16% to 51.46%, and in the USD/CHF currency pair
resulted in a minor improvement. We believe that this could be related with the high volatility presented
by this currency pair.

• In this particular case study, the achieved test accuracy is obtained in similar way to validation accuracy,
since they are both collected from held-out data, never seen by the system. We simply decided to kept
both in order to better understand how the created model works when trying to predict events that brake
the temporal order sequentiality displayed by financial time series. By introducing Batch Normalization
we accomplished a 0.7% of accuracy improvement when predicting with using the available test set. A
comparison between a normalized and a non-normalized approach is displayed in fig. 4.6.

Figure 4.6: Non-normalized test ACC vs Normalized test ACC

• Precision and recall do not directly improve with the introduction of Batch Normalization. The differences

57
between the two approaches, simply reflect how the trained system adjusts to data and forecast price
return variations.

4.4.2 Market simulator

The second way to test the model, is to evaluate it under financial metrics. We showcase the validation and
test ROI since they present how good the model performed in actual gains in held-out data. The amount of
available investment capital varies in terms of currency, since the system is tested in the 5 announced markets
and currency values are adjusted to enable always the purchase of 80 000 market positions. We also present
the number of times that on average the system assumed long, short and hold positions.

Table 4.7: Financial results

Market Validation ROI Test ROI Initial Capital Long Short Hold

EUR/USD -13.59% -9.94% 120 000 USD 478 477 5249

GBP/USD 5.18% 4.05% 120 000 USD 154 153 5897

GBP/JPY -0.49% -10.37% 13 200 000 YEN 444 444 5316

USD/JPY -7.47% -9.70% 13 200 000 YEN 516 515 5172

USD/CHF -9.33% -9.79% 120 000 CHF 538 536 5129

The results presented in table 4.7, were obtained using the same methodology performed for classification
metrics, i.e, as an average of 10 separate runs for each one of the analyzed currency pairs. Similarly to
what was performed for analyzing the predictive capabilities of the constructed model, we also added a Batch
Normalization layer to the neural network in order to study if an extra layer of normalization contributes to
greater market returns, both in validation set and test set. It is also intended to analyze the relation between
accuracy and ROI results, i.e, if the improvement of one of the metrics could lead to the improvement of the
other.

Table 4.8: Financial results with Batch Normalization

Market Validation ROI Test ROI Initial Capital Long Short Hold

EUR/USD -6.69% -8.29% 120 000 USD 276 275 5652

GBP/USD -10.77% -2.35% 120 000 USD 278 277 5649

GBP/JPY -3.00% 1.58% 13 200 000 CHF 52 51 6102

USD/JPY 2.82% -2.09% 13 200 000 YEN 223 222 5758

USD/CHF -2.70% -0.45% 120 000 CHF 238 237 5729

With the introduction of Batch Normalization, it is possible to identify some variations when comparing the
obtained Financial metrics, with the non-normalized approach. We concluded:

58
• For the calculated validation ROI, we concluded that not all the tested pairs improve their performance
with the introduction of Batch Normalization. Markets like the GBP/JPY and GBP/USD, displayed a sub-
stantial decrease in return, both in validation and test (table 4.8). It seems that for the GBP/USD market,
the additional layer makes the system to increase the number of times that long and short investments
are performed, and since the used validation set presents a long bullish trend, having a reduced number
of investments could possibly be beneficial for raising the achieved gains. For the GBP/JPY market the
system presented a behaviour that appears to be exactly the opposite. Since the validation and test
sets are taken from an area were the GBP/JPY market is extremely volatile, the decrease of market
investments posed by the usage of Batch Normalization, could likely reduce the achieved performance,
obtained in a non-normalized approach.

• When it comes to the achieved test ROI, the results seem to follow the pattern obtained for validation
ROI. The introduction of Batch Normalization enhances all currency pairs performance with exception to
the GBP/USD market (figure 4.8). Likely to what happened with the appliance of this methodology to the
validation set, the GBP/USD market presented worst results when using the second approach (fig. 4.7).
The behaviour of the markets involving the JPY currency also seem to vary when comparing different
approaches. We believe that this may be related with the high volatility presented by this markets (table
4.1), and also with the presented trend shifts, when comparing their test and validation sets. For the
other two markets, the EUR/USD and USD/CHF, the introduction of Batch Normalization remains to be
beneficial, specially in the second were its inclusion lead to significant improvements.

Figure 4.7: Non-normalized test ROI vs Normal-


ized test ROI Figure 4.8: Test ACC vs Test ROI

• Regarding assumed market positions, it is also possible to find a pattern in the above performed exper-
iments . Batch Normalization reduces the number of open short and long positions in every annalyzed
market with exception to the GBP/USD. This confirms the hypothesis that for this market, a higher number
of investments results in worst returns.

59
4.5 Case study B.1 - Accuracy as fitness function

In this second test case, we plan to study how accuracy impacts the overall performance when used as
GA fitness function. The main idea is to let the system create several individuals following the GA evolutionary
breeding process, with the final goal of maximizing the achieved accuracy, i.e, finding the individual that max-
imizes this metric. The optimization process is performed on the validation set, and for the fittest individual,
the associated FNN model weights are saved in order to previously load the network and evaluate the test
set under the same experimental conditions. Therefore, the intended experiment is to examine and compare
if the performance in validation and test data, is enhanced when only maximizing validation, and ultimately
studying if ROI is improved by this approach. The used system configuration is the one presented in section
4.3. Regarding Batch Normalization, due to the positive results achieved in the majority of markets, we decide
to maintain it during this test case.

4.5.1 Classification results

Likely to what was done in the last case study, we started by accessing the system predictive performance.
Table 4.9 displays each market results, with each single outcome being the result of averaging 10 separate
system runs.

Table 4.9: Classification results with ACC fitness

Market Train ACC Validation ACC Test ACC Test Precision Test Recall

EUR/USD 63.15% 53.89% 49.92% 55.65% 49.71%

GBP/USD 66.16% 62.78% 60.55% 66.09% 63.97%

GBP/JPY 63.11% 53.05% 51.41% 48.82% 50.81%

USD/JPY 62.49% 52.54% 50.49% 51.42% 49.79%

USD/CHF 59.48% 52.94% 50.59% 45.34% 51.34%

• The introduction of the accuracy fitness, lead both train and validation ACC to a substantial increase in
every presented market. However, when analyzing the results, is clearly noticeable the superior perfor-
mance of the system when optimized for the GBP/USD market. Results are aligned to what was show-
cased during the simple prediction approach, with GBP/USD having the highest results. This indicates a
higher capacity towards learning GBP/USD data. Test ACC shows that optimization on the validation set,
also guarantees valuable results on test, suggesting that the GA made the model converge to a solution
were some patterns and non-linearities of validation data, still hold true for test data.

• Improved results of Validation ACC do not increase the predictive capacity for all the tested currencies.
Results show that in some cases, the performance was worst than what was achieved in the simple
prediction approach.

• Precision and recall as expected, improved and seem to be more stable, since they are both components
of the test accuracy.

60
4.5.2 Market Simulator

Besides the purely predictive capacities assessed with the above measurements, investment performance
was also evaluated. Market simulations were also performed for the 5 selected markets. In addition to the
measurements took for the simple prediction test case (section 3.6), we decided to keep track also of the
Maximum Drawdown (section 4.2.2). The introduction of this metric only in the optimization approach, is
needed in order to track and present to the user, the risk involved during the test set period, indicating the
possible fall backs during future investment times. Results are displayed in table 4.10.

Table 4.10: Financial results with ACC Fitness

Market Validation ROI Test ROI MDD Long Short Hold

EUR/USD -9.89% -13.86% 6.72% 347 346 5513

GBP/USD -31.95% -37.54% 40.12% 1090 1090 4037

GBP/JPY -12.91% -0.36% 6.22% 367 367 5470

USD/JPY 5.06% -2.40% 5.10% 113 112 5985

USD/CHF -2.19% -1.82% 6.72% 199 198 5809

• The achieved results indicate worse returns in all markets, when compared to the non-optimized version
of the system. This may suggest that for the developed system, accuracy is not a well suited metric for
conducting valuable market investments.

Figure 4.9: Maximum Drawdown for GBP/USD

• The worst performance returns are obtained by the GBP/USD market. Such results indicate that the
higher predictive capacity achieved in this currency pair, 62.78% for Validation ACC and 60.55% for Test
ACC, are not reflected in terms of actual gains. The high number of long and short investments follows

61
the initially stated hypothesis that for the GBP/USD market a higher number of investments may decrease
the system performance. The developed strategy seems to not be suitable for this type of market, also
presenting the highest risk among all the tested currencies. Figure 4.9 presents the evolution of this
metric throughout the entire market simulation for one of the executed runs, on which is possible to
detect that on the majority of time spent on the market the system is only spending the invested capital,
without being capable to generate any profit.

4.6 Case study B.2 - ROI as fitness function

Likely to what has been done for the previous case study, we also decided to test the system using as GA
fitness function the ROI function. This way, we make the GA evolutionary process to search for the fittest
individual in terms of validation ROI and check if its superior performance also holds true for the test set.
The used system configuration is the same as the one used in the previous case study 4.5. The followed
methodology is also the same, with classification, and financial results being presented.

4.6.1 Classification results

Table 4.11 presents the predictive capacities of the optimized model using ROI as fitness function. The idea
is to showcase how optimizing a different fitness function impacts the achieved goodness of fit in each studied
market.

Table 4.11: Classification results with ROI fitness

Market Train ACC Validation ACC Test ACC Test Precision Test Recall

EUR/USD 59.88% 50.19% 49.58% 65.44% 49.33%

GBP/USD 61.59% 51.36% 50.79% 27.85% 54.68%

GBP/JPY 60.73% 51.01% 50.64% 23.83% 49.76%

USD/JPY 60.48% 50.41% 50.06% 68.43% 49.59%

USD/CHF 59.63% 50.71% 50.67% 42.92% 51.17%

• The presented results indicate low accuracy results both in validation and test. This is somewhat ex-
pected since ROI is the maximized function in the GA architecture. However this may not be the indicative
of a inferior market performance, and although the FNN may not be capable of predicting all the majority
of positive and negative market returns, if it still has the capability to correctly predict some of the higher
profit entry and exit market points, the returns could still be positive.

• Once again, the best performance is showed by the GBP/USD market. This proves that the system
predictive capabilities are more prone to correctly forecast in this market, and despite the fitness function
not accounting with the percentage of correctly classified individuals, the system still has a higher learning
capacity when compared to the learning process of the other markets.

62
4.6.2 Market Simulator

Table 4.12 presents the results of ROI in each evaluated market.

Table 4.12: Financial results with ROI fitness

Market Validation ROI Test ROI MDD Long Short Hold

EUR/USD 14.92% -1.93% 5.91% 52 53 6167

GBP/USD 13.40% 5.39% 6.08% 104 103 5961

GBP/JPY 12.29% -2.34% 7.14% 144 144 5918

USD/JPY 9.79% -1.21% 6.65% 103 102 6011

USD/CHF 10.41% 4.55% 3.17% 56 55 6077

• Comparing the results with the ones obtained in the simple prediction approach, every evaluated market
improved its performance in terms of validation ROI. This is obviously an anticipated result. The range
of obtained values gives an idea of what is the average performance of best individual throughout the 20
generations in which the system was optimized.

• In terms of market transactions, it seems that also on average, long and short positions are kept at small
values. The system is more likely to favour a low number of investments.

• Regarding the achieved test ROI, the system exhibits a positive performance in the GBP/USD market.
We empirically proved that it is possible to achieve positive results in this market. This follows what
was previously measured in section 4.1.1, where Kurtosis values present the presence of ”fat tales” in
the return distribution, with GBP/USD being the second highest market concerning this measurement.
Figure 4.10 presents the achieved results of ROI, presenting the average, best and worst individuals
generated throughout 10 system runs, calculated in hourly periods during the test period. For best and
worst individuals we got a ROI of 11.51% and 2.98% respectively, which indicates an substantial amount
of variance in terms of obtained results. However, it is worth notice the system ability to only obtain
positive results during the 10 performed system runs.

• The USD/CHF market also presents a positive result in terms of test ROI. However the achieved per-
formance is slightly lower when compared to the GBP/USD market. We believe that this may be due
to the extremely oscillatory behavior and low volatility presented by this market. This is also proved by
the lower values of drawdown, which are the lowest among all the tested pairs. Similarly to what has
been displayed for the GBP/USD pair, figure 4.11 demonstrates the ROI results for the average, best and
worst individual, both sampled in hourly periods during the test period. For the best and worst individuals
we obtained a ROI of 6.35% and 3.40% respectively. This showcases the stability and low variance of
results proposed by this system solution.

• The above results confirm the hypothesis that using ROI as GA fitness function yields better results than
using a fitness function purely based on the classification capabilities of the system, such as accuracy in
section 4.5.

63
Figure 4.10: Best, average and worst system individuals for GBP/USD

Figure 4.11: Best, average and worst system individuals for USD/CHF

4.7 Case Study 3 - Further investigation on profitable markets

This section was created in order to extend and enhance the previously performed system analysis, on a
specific set of currency pairs that achieved promising results in the proposed case studies. We felt that such
approach was needed in order to better understand how the system is working in profitable market simulations,
analyzing its decisions and results in greater depth. Additionally, the large amount of tested models, accounting
with different system parameter tuning, pose a computational intensive problem in terms of power and time,
and could potentially not be beneficial to find the best system architecture for each one of the displayed mar-

64
kets. Therefore, it seemed reasonable to only extend this analysis to currencies that have already displayed
superlative behaviour.
Following the mentioned approach, we decided to conduct the investigation on the GBP/USD and USD/CHF
markets. We based this decision on the results achieved in the prior case studies were both markets displayed
promising outcomes. The selected base architecture for further experiments, will be based on the initially
selected configuration (section 4.3) with ROI being the selected fitness function. Experiments performed in
section 4.5 and 4.6, confirm that for the studied markets, using ROI over accuracy as GA fitness function,
improves the overall system performance, ultimately leading to better results. This is the case for the two
selected currency pairs, which were the only ones that were able to generate a profitable return during test
period, when optimized accounting with ROI.

4.7.1 Benchmark comparisons

This section focuses on comparing and evaluate the results obtained by the two proposed investment solutions
against traditional trading benchmarks. This is crucial in order to access the overall model stability and trading
utility in terms of profit generation, despite (as seen in section 4.6) positive returns in both markets (GBP/USD
and USD/CHF). All the used strategies are compared making usage of ROI as comparative metric. It is also
important to refer that transaction costs were set to 0.0001% of the opened position, for every used strategy.
As comparative benchmarks we selected the following 3 trading strategies:

• Buy & Hold: This classical approach relies upon the belief that prices move in long bullish trends, and it
is not possible to forecast market variations relying on past data. Therefore, a Buy & Hold strategy is a
passive investment where the trader opens a long position and holds it for a long period of time until an
opinion reversal.

• Sell & Hold: The Sell & Hold strategy is another classical strategy, similar to Buy & Hold, but based
on the belief that prices move in a long bearish trend. It also results in a passive investment operation,
where trader opens a short position, and closes it only after a change in opinion.

• Random Walk: This trading benchmark is resultant of the Random Walk Theory [1], which states that
market fluctuations are randomly generated and completely unpredictable by the usage of historical mar-
ket information. Hence, the application of this strategy is done by generating a binary random signal,
analogous to the y label vector created during the system workflow. Contrarily to the other two presented
market strategies, that only rely on one single market operation, the Random Walk is able to open both
long and short market positions.

The following comparisons will serve as an empirical confirmation that the deployed system is suitable for pos-
sible market investments in the GBP/USD and USD/CHF markets. The 3 selected strategies were specifically
selected to test if the created strategy is capable to react to different market conditions. The Buy & Hold test
and Sell & Hold test, were picked because each market has an underlying bullish or bearish trend in the se-
lected testing period. The Random Walk strategy is used to check if the created model is not beat by a purely
random strategy, which would eventually confirm the previously outlined Efficient Market Hypothesis (section
1).

65
4.7.2 USD/CHF

Table 4.13 presents a comparison between USD/CHF, average, best and worst individuals with the 3 bench-
mark strategies above mentioned. As comparative indicators we selected 4. The first one is obviously the
ROI, which is the foundation of this entire comparison, and is going to serve as base for other used metrics.
Profitable transactions displays the percentage of profitable transactions among all the ones performed during
trading. The third metric, days with positive ROI, gives an idea to the user about how risky each strategy is in
terms of stability and sustainable growth. Finally, Maximum Drawdown is used again, but this time account-
ing with drawdowns present in ROI past data, instead of returns or portfolio value. Figure 4.12 presents the
evolution of the 3 benchmark strategy against the proposed system.

Table 4.13: USD/CHF strategies comparison

Parameters Avg Best Worst Buy&Hold Sell&Hold Random

ROI 4.55% 6.35% 3.41% -4.09% 4.09% -12.98%

Profitable 67.84% 76.05% 58.27% 0% 100% 21.55%


transactions

Days with 83.11% 86.71% 83.41% 2.51% 99.72% 2.25%


positive ROI

Maximum 3.18% 3.17% 3.48% 6.05% 3.98% 29.29%


Drawdown

Figure 4.12: USD/CHF strategies evolution over time

66
The proposed solution for the USD/CHF currency pair exhibits a promising behavior, illustrated in figure
4.12. Both Buy&Hold, and the Random Walk strategy are clearly outperformed by the proposed solution with
the two reaching a negative ROI of -4.09% and -12.98% respectively, not being able to provide any return
to the user in any point during the market simulation period, explained by a constant decaying growth. The
Sell&Hold strategy is the only benchmark that is capable to present a performance suited for the behavior
displayed by the USD/CHF. As its possible to see in figure 4.12, the Sell&Hold is only broke by the developed
strategy by the end of the trading year, reaching a ROI of 4.09%, a value extremely close to the average ROI of
4.55% shown by the system. However, when examining the evolution of it, it is possible to see a high volatility
throughout time, which indicates undesirable risk to the trader. In contrast, the proposed system displays a
steady growth during the whole trading period. Its also worth notice its security and less riskier conduct during
the 10 performed system runs, which could be seen by the small range of ROI values that separate the worst
and best individuals. The less riskier profile assumed by the developed solution could be identified through
the values presented in the Maximum Drawdown row, which indicate the biggest drawdown during the trading
period in terms of ROI. Regarding that metric, we can confidently state that the proposed solution is less riskier
than all the benchmark strategies.

Figure 4.13: USD/CHF market entry points

In figure 4.13 it is possible to observe the 72 short and long positions opened by the best proposed system.
The densest presented area, starting at 2017-10. represents the turnover point for the presented solution. This
is the point were the proposed solution surpassed the Sell&Hold strategy (available in figure 4.12, on the same
date) being able to outperform it until the end of the trading period. Moreover, despite not being able to stay
in its highest ROI value, the system still remains profitable in the beginning of 2018. a period where the trend
inverted its path, making the index decrease to the lowest market quote available in the performed test. Such

67
abilities prove the capacity of the algorithm to deal with both bullish and bearish market periods, and although
presenting a slow and steady growth, the ability to still remain profitable in this market.

4.7.3 GBP/USD

Likely to what was done for the USD/CHF market, table 4.13 presents a comparison between the average, best
and worst individuals against the 3 selected benchmark strategies. The presented measurements reveal that
Sell&Hold and Random Walk strategies, provide a negative ROI of -10.69% and -15.78%. We can confirm this
information by looking to figure 4.14, and conclude that from the 3 benchmarks, only the Buy&Hold strategy is
capable to adapt to the GBP/USD index, mainly due to its long bullish trend presented in the selected trading
period. In fact, the Buy&Hold strategy presents an extremely profitable evolution by reaching a value of return
of 10.69%. Therefore it is possible to notice the ability of the proposed system to create conservative strategies
similar to the Buy&Hold.

Table 4.14: GBP/USD strategies comparison

Parameters Avg Best Worst Buy&Hold Sell&Hold Random

ROI 5.65% 11.51% 2.98% 10.69% -10.69% -15.78%

Profitable 45.87% 47.37% 42.45% 100% 0% 14.52%


transactions

Days with 98.77% 99.95% 97.47% 99.38% 1.59% 4.05%


positive ROI

Maximum 5.97% 4.24% 5.76% 3.94% 14.58% 36.09%


Drawdown

Figure 4.14: GBP/USD strategies evolution over time

68
This happens because during the system optimization period, the best individuals behave in a similar way,
favouring a small number of open market positions, heavily shaped by the way how the market grows. However,
when comparing it with the achieved results, it is possible to check that only the best individual was capable to
outperform it, achieving a ROI of 11.51%. Both the average and worst individuals were not capable to surpass
it, which indicates that the proposed strategy presents a lot of variance throughout the 10 system runs. We can
confirm that by looking into table 4.14, where the displayed ROI results for the average and worst individuals,
achieved disappointing values of 5.65% and 2.98%, respectively. Therefore, it is possible to conclude that this
strategy is not capable to beat the Buy&Hold strategy, which revealed to be more profitable and less risky,
with a Maximum Drawdown of 3.94%. Still regarding the risk involved in each strategy, SellHold and Random
present an extremly high Maximum Drawdown due to their continuous bearish trend.

4.7.4 GBP/USD without Batch Normalization

Since the last experiment proved that the deployed system was not capable to develop an enough reliable
investment strategy for the GBP/USD market, we decided to extend the experimental process in order to
improve the achieved results. Therefore, we proceeded with some architectural changes towards achieving a
more stable configuration able to create a new trading strategy that would manage to produce a performance
superior to the Buy&Hold benchmark. The main idea is to develop a new version of the proposed system
in which the majority of runs creates individuals capable to surpass the selected benchmark, which would
ultimately result in a high profit average individual. The adopted approach was based on the measurements
took throughout the system analysis, specially the ones obtained in the first case study (4.4), were we focused
on measuring the results of a non-optimized approach, and also the inclusion of a Batch Normalization layer in
the FNN internal architecture. By looking into the results obtained during the market simulation, it is possible
to see that the introduction of a Batch Normalization layer yield better results for every tested market except
for the GBP/USD market, which lead to its attachment to the model in case study B.1 and case study B.2 (4.5
and 4.6). Hence, based on those results, we decided to test the system without the Batch Normalization layer
previously introduced. The achieved results are presented in table 4.15.

Table 4.15: GBP/USD without BN strategies comparison

Parameters Avg Best Worst Buy&Hold Sell&Hold Random

ROI 14.19% 17.81% 8.08% 10.69% -10.69% -19.36%

Profitable 45.87% 50.85% 46.53% 100% 0% 12.78%


transactions

Days with 99.79% 99.91% 99.74% 99.36% 1.61% 1.53%


positive ROI

Maximum 2.94% 4.45% 7.88% 3.94% 14.58% 40.25%


Drawdown

By analyzing the above results, it is possible to conclude that the system performance was greatly improved
by the removal of the batch normalization layer. Figure 4.15 depicts the evolution of the new configuration

69
against the 3 benchmarks. Since the tested market remains unchanged, we can automatically exclude the
Sell&Hold and Random Walk strategies due to not being suitable for the evolution of the market index. For
the new proposed solution, the achieved average ROI is 14.19%, a value that significantly beats the Buy&Hold
produced ROI, indicating less variance in the produced solution. Besides that, the system was also able
to improve the results returned by the best and worst individuals. Regarding them, we achieved a result of
17.81% for the best individual ROI, a value that clearly outperforms the benchmark strategy. For the worst
individual we also improved the system efficiency as well, achieving a ROI of 8.08%, a value that is closer
to the Buy&Hold ROI than what we were able to get from the normalized version of this experiment. We
believe that this approach lead to better results because it was able to reduce the predictive capacity achieved
by the system. Throughout the entire experimental test, which included all the available case studies, it is
noticeable that the system is more prone to learn the GBP/USD market. Therefore, the introduction of a batch
normalization layer, a technique that seeks accuracy improvement, drove each individual FNN to enhanced
results in training, which created an overfitted model that learnt the training data too well. We can confirm this
fact by looking into each case study table, and check that GBP/USD train accuracy is always superior to its
peers.

Figure 4.15: GBP/USD without BN strategies evolution over time

Moreover, by analyzing the results obtained in table 4.9 and table 4.10, in case study 4.5, we can observe
that the optimization procedure was able to find extremely fitted individuals that held the best performance for
train validation and test accuracy, but when investing with those individuals, the average results for ROI, both
in validation and test, displayed an awful performance, suggesting that the model was capable to learn certain
parts of the training data, but by creating an excessive active strategy (1090 long and 1090 short positions),
was not able to generalize and be conscious of the overall market trend, assuming a reactive behaviour that

70
attempted to predict all the small return variations. This confirms the hypothesis that Batch Normalization is
not suited for this market, and the usage of it unable the trading capacity of the created system. Figure 4.16
displays the best individual opened positions during the trading period.

Figure 4.16: GBP/USD without BN market entry points

4.7.5 Feature selection results

In this section we focus on analyzing the impacts of feature selection in the two proposed solutions, both
for the GBP/USD and USD/CHF market. This includes the number and type of selected features, in this case
technical indicators, among all the implemented ones. Since the system configuration remains unchanged in
comparison to the last experimental analysis, the enabled features are EMA, SMA, RSI, MOM, ATR, ATRP,
BB, ADX, AA, CMO, DPO, DEMA, ROC, DSS, ROC, KURT, SKEW, STD, STV, CCI, MACD and PO. The opti-
mization procedure carried by the GA, is going to select relevant features among all these ones, defining and
tuning the parameters of each TI as well, following the evolutionary optimization process that was described
in section 3.4. Additionally, both solutions are also analyzed in terms of internal model structure. This is also
defined by the GA optimization methodology, by selecting the number of neurons present in each neural layer
of the final created FNN. Table 4.16 presents the selected features for each one of the proposed solutions, with
respective parameters, both resultant from the evolutionary process of the GA.

71
Table 4.16: Selected TI

Market Selected features

GBP/USD (EMA, 97), (SMA, 50), (AA DOWN, 84), (DSS, 99),(SKEW, 83),
(STV, 72), (DEMA, 24), (PO,88)

USD/CHF (EMA, 81), (SMA, 77), (HMA, 83), (AA UP, 35), (LOW BB, 9),
(MID BB, 50), (DSS, 98), (ATRP, 57), (STD, 18), (DEMA, 85)

The above results denote one tendency among the two different markets. Although the system has at its
disposal a vast number of 22 different TI, we can clearly notice that the most profitable individual, for both
currency pairs, only uses half or less of them. In terms of selected TI parameters, it is also possible to notice
that the GA seems to be more prone to use TI that account with a large period of past hours, which is probably
related with the extensive size of the validation period. In order to analyze the usage of features throughout
the optimization procedure, we plotted two histograms, 4.17 and figure 4.18, that show the distribution of used
features across each GA generation, with the red vertical dashed line representing the mean value.

Figure 4.17: GBP/USD histogram

Regarding the presented images, it is important to mention that the showcased results, also include 3
features to which the selected ones were aggregated, namely Close, High and Low. The figures show that,
over 20 generations of 200 initially created individuals, both solutions present as average of used features,
a number approximate to 13 and 14 features, throughout the entire system run. For the GBP/USD we can
conclude that the achieved result is not in agreement with the presented histogram, with 11 features only being
used by approximately 600 individuals. For the USD/CHF the achieved results are distinct, and the majority of
the population uses the same number of features has the proposed solution. To complement this information,
we also present in figure 4.19 the evolution of the average number of used features, across the 20 generations
of the GA. The displayed values show that for both currencies, the system individuals converge to values close

72
to the previously calculated mean. This confirms that the integration of a feature selection method in the GA
algorithm, is functioning, decreasing the number of features and reducing the variance of each generation
results.

Figure 4.18: USD/CHF histogram

Figure 4.19: USD/CHF and GBP/USD number of features over generation

73
4.7.6 Fitness evolution

In order to validate even further the GA evolutionary optimization process, it is also important to analyze how
the validation ROI evolves throughout the number of GA generations. We decided to create two box-and-
whisker plots concerning, the number of the validation ROI of each element throughout generation in fig. 4.20
and fig. 4.21. Box-and-whisker plots are useful to understand how is the variability of data in each one of
the GA generations. The whiskers present the maximum and minimum values encountered in that generation,
while the box displays the range between data’s first and third quartile (interquartile range - IQR), with both
quartiles being separated by the ROI median value (the black line inside each box).

Figure 4.20: GBP/USD box-and whisker plot

These two presented figures prove that in each generation, individuals are converging and achieving better
ROI scores. In figure 4.20, we can see that each generation IQR, is getting increasingly smaller, at the same
time that the median values are becoming progressively higher. These two factors combined indicate that at
least 50% of the created individuals are following the evolutionary procedure introduced by the GA. Values
above and below the IQR present outliers ( values that achieve surprisingly high/low ROI), which can always
be found throughout the optimization. This is the case of the best individual which is given by the fittest value
of the fifteenth generation. This is also the case for the USD/CHF currency pair, where the fittest individual
can again be found at fifteenth generation of the GA (figure 4.21). In terms of fitness optimization, this market
presents a slower convergence with individuals increasing at a slower rate when compared to EUR/USD. It
may seem that values are not converging, but if we look at figure 4.22, we can see that the average of each
generation is increasing towards fitter ROI values.

74
Figure 4.21: USD/CHF box-and whisker plot

Figure 4.22: USD/CHF roi vs gen

In fact, we can even confirm that USD/CHF provides a softer fitness conversion, with ROI values presenting
a smoother increase when compared to the oscillating behavior assumed by the GBP/USD evolution signal.
This proves that the best individuals are not always located in the last generations of the optimization process,

75
and often the optimized solutions are given by outliers present in later generations, which are subsequently
chosen by the GA selection mechanism for the evolutionary breeding procedure. This results in the appliance
of crossover and mutation operators to the best individuals, which in many cases does not implicitly creates
the fittest individual, but rather the convergence of the whole population. Figure 4.22 presents both markets
average ROI evolution, with results revealing that the optimization procedure is working, with both results
converging to higher values, maximizing the desired fitness function. The best individuals of each solution
were found in 19th and 20th generation of GBP/USD and USD/CHF respecitively, which also indicate that the
values plotted in figure 4.22, are not created by high variation of ROI in same generation individuals, and the
highest value of is achieved in the final phase of the optimization.

4.7.7 Topology evolution

Finally to conclude the analysis of the proposed system solutions, we must also inspect and examine the
architecture of the two final FNN models. As it was explained in chapter 3.4, the final optimized chromosome
encodes two genes that are responsible for the number of used neurons in the first and second layer of the
developed FNN. The system creates different architectures for the two obtained solutions. The number of
neurons used in each layer for GBP/USD and USD/CHF is presented in table 4.17.

Table 4.17: Solutions architecture

Market Number of inputs First layer neurons Hidden layer neurons

GBP/USD 11 80 63

USD/CHF 13 32 40

It is also important to remember that the range of values obtained for each network layer is set by the initial
configuration file, with both solutions following the setup parameters presented in section 4.3. Therefore, the
assumed number of neurons ranges from 5 to 100. Furthermore, to complement this information and locate
the chose values among all the selected ones, we decided to create two bar-plots that display the 15 most used
number of neurons genes, one for the first layer and a another one for the second layer. Note that the displayed
genes are the ones with highest number of occurrences during the optimization period, i.e throughout the 20
generations of the GA. For the GBP/USD we obtained the following two plots (fig. 4.23 and fig. 4.24). Has
it is possible to see in figure 4.23 for the first layer of the generated FNN for the GBP/USD currency pair, the
value of 80 neurons for the first layer of the network is the most frequent one, which is in accordance with the
selected solution. Same thing happens for the second layer (fig. 4.24), with 63 also being the most frequent
gene among the most frequent 15. This may suggest that the GA converges to such values throughout the
optimization procedure. We can confirm this information in fig. A.1, where we plotted a violinplot that shows
the evolution of the number of selected neurons in layer 1 and layer 2 during the optimization procedure. This
is done by computing the kernel density estimation (KDE), of all the selected neuron values during a single
generation. Results show that as the algorithm proceeds, the choice of having a small number of neurons for
both layers is substantially reduced and in the last layers the selected values are reduced to range between 40
and 100 neurons.

76
Figure 4.23: 15 most used number of neurons in GBP/USD 1st FNN layer

Figure 4.24: 15 most used number of neurons in GBP/USD 2nd FNN layer

For the USD/CHF currency pair two barplots with the most frequent 15 gene values were also created (fig
4.25 and fig. 4.26). The achieved results present a different behavior when compared to the one obtained for
the GBP/USD market, with the GA usually picking a high number of neurons for the first FNN layer, and small
number of neurons for the second FNN layer. However the the values selected for the final solution do not
follow this trend, with 32 for the first layer and 40 for the second one. Moreover, the values selected for the
final solution do not correspond the most frequent genes during the optimization. For both layers, they differ
from the spotted tendency, which may confirm the hypothesis that the individual chose by the GA represents a

77
minority in terms of architecture, and the GA convergence did not followed that direction.

Figure 4.25: 15 most used number of neurons in USD/CHF 1st FNN layer

Figure 4.26: 15 most used number of neurons in USD/CHF FNN 2nd layer

The topological convergence of the USD/CHF market can be seen in fig. A.2. In it, we can prove that
as individuals evolve throughout the optimization procedure, the tendency is to select a higher number of
neurons for the first layer and a lower number of neurons for the second one. We can also confirm that the
achieved values in the proposed solution are not the most frequent in the generation where its individual was
created. If we look to the distributions available in the fifteenth generation, it is possible to notice that the KDE

78
curve is extremely small in the areas correspondent to the number of neurons on the first and second layer.
However we can clearly spot the influence of that individual in the convergence of the algorithm in following GA
generations. After the 15th generation the KDE curves that accommodate 32 and 40 (for the first and second
layer respectively), become progressively larger towards the 20th generation.

79
80
Chapter 5

Conclusions

In this thesis it was presented a financial forecasting system that combines Evolutionary Computing with Deep
Learning, in order to provide a trading strategy capable to maximize the obtained returns and minimize the
associated investment risk. The baseline model for the developed system was based on a Feed Forward
Neural Network, optimized by a Genetic Algorithm. To test the developed system, 5 different FOREX currency
pairs were selected, EUR/USD, GBP/USD, GBP/JPY, USD/JPY and USD/CHF. This provided a way to access
the system capacity to adapt to distinct market conditions, since each currency pair index exhibited their own
particular properties throughout the chose data sample time span, 5 years of hourly data.
To create more data to feed the developed system, a vast number of Technical Indicators, mathematical
formulations that account with past market variations, were used as feature generation tools. The main idea
behind this decision, was based on the premise that if there are traders that can consistently beat the market
using Technical Analysis, then an expert system that learns through the usage of past data, should also be
capable to get some insights about the market current behavior, creating educated guesses about how is
it going to evolve in a near future. Therefore, as consequence of this choice, the system deeply relies on
past data to create future predictions, based on the assumption that markets are inefficient. This is in direct
opposition with the Efficient Market Hypothesis, which states that markets are completely random, belief that
we try to refute throughout the course of this work.
Following this methodology we were able to achieve promising results in some of the tested currency pairs.
A simple prediction model without optimization was created in order to check how is the behavior of each market
based on purely random internal parameter optimization. Two versions of this test case were performed, one
with Batch Normalization and another one without it. Results proved that on all tested markets except the
GBP/USD, the usage of Batch Normalization improves the predictive performance and the achieved returns
(section 4.4). Therefore, backed up by such results, we decided to proceed the optimization case studies with
the inclusion of a Batch Normalization layer in the FNN architecture. For these case studies (section 4.5 and
section 4.6), the best results were obtained using as GA fitness function the ROI, instead of the traditional
predictive performance measurement given by ACC. By testing both, we were able to conclude that when
optimizing a system with ACC, we create models that attempt to make a huge number of transactions in a
short period of time. Due to their enhanced predictive capacities, they are able to predict more accurately
small market variations, which ultimately conducts them to negative returns, because of wrong replication of
that behavior in distinct periods of data.

81
We were able to come up with two different solutions one for the GBP/USD market, and another one for
the USD/CHF. For the GBP/USD we were able to surpass the Buy&Hold strategy by achieving an average ROI
of 14.19%, and a 17.81% ROI for the best achieved individual, against 10.69% achieved by the Buy&Hold.
Regarding the USD/CHF currency pair, we were able to outperform the Sell&Hold strategy that obtained a ROI
of 4.09%, by achieving an average ROI of 4.45% and a ROI of 6.35% for the best seen individual. Such results
prove that it is possible possible to extract some profit out for market trading by recurring to previous periods of
data, indicating that markets are not completely efficient.
Finally in the last 3 subsections of chapter 4.7, we conducted an extended analysis on the two achieved
solutions to understand the convergence of the GA in terms of features, fitness and network topology. The
achieved results show that the algorithm converges to improved values of validation ROI, at the same it reduces
the number of used features, normally preferring to use TIs that account with big periods of information.

5.1 Future Work

As a follow-up to the presented work, a series of distinct directions can be explored in order to try to improve
the developed system. Some of the most relevant are presented next:

• Introduction of a leverage mechanism. This would be interesting in order to evaluate how the system
would deal with the potential risk that its usage could potentially bring.

• Introduction of a dynamic GA. There are several implementations of this algorithm, but the main idea is to
give to the GA the possibility to dynamic adapt the mutation and crossover rate, according to the fitness
values given by the current generation. This would benefit the overall convergence of the algorithm
towards a global minimum/maximum.

• Test more fitness functions, specially related with the financial domain. A good test would be to try the
sharpe ratio, an indicator that attempts to measure the risk-adjusted return.

• Introduction of time series cross validation with an expanding window to better access the model predic-
tive power.

• Explore the usage of LSTM neural networks and GRU neural networks, since they account with temporal
dependencies

• Unsupervised Learning for feature extraction

82
Bibliography

[1] B. G. Malkiel. The efficient market hypothesis and its critics. Journal of economic perspectives, 2003.

[2] E. F. Fama. Efficient capital markets: A review of theory and empirical work. The journal of Finance, 1970.

[3] R. C. Cavalcante, R. C. Brasileiro, V. L. Souza, J. P. Nobrega, and A. L. Oliveira. Computational intelligence


and financial markets: A survey and future directions. Expert Systems with Applications, 2016.

[4] M. Casson and J. S. Lee. The origin and development of markets: A business history perspective.
Business History Review, 2011.

[5] A. Lunde and A. Timmermann. Duration dependence in stock prices: An analysis of bull and bear markets.
Journal of Business & Economic Statistics, 2004.

[6] Triennial central bank survey of foreign exchange and otc derivatives markets in 2016, 2016.

[7] R. Cont. Empirical properties of asset returns: stylized facts and statistical issues. 2001.

[8] M. A. Josef Arlt. Financial time series and their features. Acta oeconomica pragensia.

[9] M. P. Taylor and H. Allen. The use of technical analysis in the foreign exchange market. Journal of
international Money and Finance, 1992.

[10] F. B. Matos. Ganhar em Bolsa. Leya, 2015.

[11] Y. Zhu and G. Zhou. Technical analysis: An asset allocation perspective on the use of moving averages.
Journal of Financial Economics, 2009.

[12] J. J. Murphy. Study Guide for Technical Analysis of the Futures Markets: A Self-training Manual. New
York institute of finance New York, 1987.

[13] A. Gorgulho, R. Neves, and N. Horta. Applying a ga kernel on optimizing technical analysis rules for stock
picking and portfolio composition. Expert systems with Applications, 2011.

[14] A. Bakhach, E. Tsang, and W. L. Ng. Forecasting directional changes in financial markets. Technical
report, 2015.

[15] R. S. Michalski, J. G. Carbonell, and T. M. Mitchell. Machine learning: An artificial intelligence approach.
Springer Science & Business Media, 2013.

[16] P. Domingos. The master algorithm: How the quest for the ultimate learning machine will remake our
world. Basic Books, 2015.

83
[17] F. Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain.
Psychological review, 1958.

[18] M. A. Nielsen. Neural networks and deep learning. Determination Press, 2015.

[19] K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators.
Neural networks, 1989.

[20] Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller. Efficient backprop. In Neural networks: Tricks of the
trade. Springer, 1998.

[21] P. J. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE,
1990.

[22] J. Kiefer, J. Wolfowitz, et al. Stochastic estimation of the maximum of a regression function. The Annals
of Mathematical Statistics, 1952.

[23] T. Bäck, D. B. Fogel, and Z. Michalewicz. Evolutionary computation 1: Basic algorithms and operators.
CRC press, 2000.

[24] R. Storn. On the usage of differential evolution for function optimization. In Fuzzy Information Processing
Society, 1996. NAFIPS., 1996 Biennial Conference of the North American. IEEE.

[25] K. Jebari and M. Madiafi. Selection methods for genetic algorithms. International Journal of Emerging
Sciences, 2013.

[26] J. Yao and C. L. Tan. A case study on using neural networks to perform technical forecasting of forex.
Neurocomputing, 2000.

[27] Y. Kara, M. Acar Boyacioglu, and O. K. Baykan. Predicting direction of stock price index movement using
artificial neural networks and support vector machines. Expert Syst. Appl., 2011.

[28] J. Patel, S. Shah, P. Thakkar, and K. Kotecha. Predicting stock market index using fusion of machine
learning techniques. Expert Systems with Applications, 2015.

[29] M. Qiu, Y. Song, and F. Akagi. Application of artificial neural network for the prediction of stock market
returns: The case of the japanese stock market. Chaos, Solitons & Fractals, 2016.

[30] T. Fischer and C. Krauss. Deep learning with long short-term memory networks for financial market
predictions. European Journal of Operational Research, 2017.

[31] C.-H. Wu, G.-H. Tzeng, Y.-J. Goo, and W.-C. Fang. A real-valued genetic algorithm to optimize the
parameters of support vector machine for predicting bankruptcy. Expert systems with applications, 2007.

[32] O. B. Sezer, M. Ozbayoglu, and E. Dogdu. A deep neural-network based stock trading system based on
evolutionary optimized technical analysis parameters. Procedia Computer Science, 2017.

[33] Y. Perwej and A. Perwej. Prediction of the bombay stock exchange (bse) market returns using artificial
neural network and genetic algorithm. Journal of Intelligent Learning Systems and Applications, 2012.

84
[34] A. Gorgulho, R. Neves, and N. Horta. Applying a ga kernel on optimizing technical analysis rules for stock
picking and portfolio composition. Expert systems with Applications, 2011.

[35] A. Hirabayashi, C. Aranha, and H. Iba. Optimization of the trading rule in foreign exchange using genetic
algorithm. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM,
2009.

[36] G. Rossum. Python reference manual. Technical report, 1995.

[37] W. McKinney. Pandas, python data analysis library. 2015. Reference Source, 2014.

[38] K. J. Magnuson. Pyti, 2017.

[39] S. J. Brown and J. B. Warner. Using daily stock returns: The case of event studies. Journal of financial
economics, 1985.

[40] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller. Efficient backprop. In Neural networks: Tricks of the
trade. Springer, 2012.

[41] F. Chollet et al. Keras, 2015.

[42] F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and C. Gagné. DEAP: Evolutionary algorithms
made easy. Journal of Machine Learning Research, 2012.

[43] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal
covariate shift. arXiv preprint arXiv:1502.03167, 2015.

[44] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of
the 27th international conference on machine learning (ICML-10), 2010.

[45] Y. Bengio. Practical recommendations for gradient-based training of deep architectures. In Neural net-
works: Tricks of the trade. Springer, 2012.

[46] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,
2014.

[47] M. Waskom, O. Botvinnik, D. O’Kane, P. Hobson, J. Ostblom, S. Lukauskas, D. C. Gemperline,


T. Augspurger, Y. Halchenko, J. B. Cole, J. Warmenhoven, J. de Ruiter, C. Pye, S. Hoyer, J. Vander-
plas, S. Villalba, G. Kunter, E. Quintero, P. Bachant, M. Martin, K. Meyer, A. Miles, Y. Ram, T. Brunner,
T. Yarkoni, M. L. Williams, C. Evans, C. Fitzgerald, Brian, and A. Qalieh. mwaskom/seaborn: v0.9.0 (july
2018), 2018.

85
86
Appendix A

Topology evolution plots

In this appendix we provide two plots related with the last subsection of this work (section 4.7.7). Plots were
created with the help of Seaborn [47] a Python statistical data visualization library.

87
120 Layer
1
2

100

80

60

88
40

Number of ocurrences
20

−20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Generations

Figure A.1: Evolution of the number of neurons in GBP/USD


Layer
1
120 2

100

80

60

layer

89
40

20

−20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Gen

Figure A.2: Evolution of the number of neurons in USD/CHF


90

Você também pode gostar