ML4Finance

Implementation and comparison of Machine Learning methods for log-returns forecasting.

Intro

Getting started

Project description

Over the last few years, the cryptocurrencies became in-creasingly popular as an investment product and for a portfolio diversification strategy. A still increasing body of literature focused on the pertinence of the efficient market hypothesis (EMH). In essence, the EMH postulates that efficient markets reflect all past, public or public and private information in market prices. Verification of the EMH is important for market participants as it implies that such information cannot be used to make persistent profits on trading on the market.

In this context, we propose, in the continuation of Wildi et al.(2019), to extend their approachs to other cryptocurrencies and to commodities and market indices to see weather positive trading performances can be achieved with forecasting using machine learning models.

We here implement the following four machine learning methods to forecast the log-returns of several assets:

  • Feedforward Neural Network (NN)
  • Convolutional Neural Network (CNN)
  • Long-Short-Term-Memory (LSTM)
  • Random Forest

We further analyze weather the combination of several models trained on one asset (combination 1) gives better results than the results of the best model only and weather the combination of the best model trained on different assets of the same type (combination 2) gives better results than the results of the best model trained on only one asset of this type. Therefore we test their performance with several trading performance metrics.

Data

The collection of data has been performed on different web sites herafter detailed with the assets’ symbol used in the code:

Asset typeAsset nameSymbolPeriodsLink
Crypto-currencyBitcoinBTC-USD2017-11-09 to 2022-12-13here
Crypto-currencyEtherETH-USD2017-11-09 to 2022-12-13here
Crypto-currencyRippleXRP-USD2017-11-09 to 2022-12-13here
CommodityGoldLBMA-GOLD2012-12-13 to 2022-12-09here
CommodityNatural GasNYMEX-NG2012-12-13 to 2022-12-09here
CommodityOilOPEC-ORB2012-12-13 to 2022-12-09here
Stock market indexS&P500SP5002012-12-13 to 2022-12-12here
Stock market indexSMISMI2012-12-13 to 2022-12-12here
Stock market indexCAC40CAC402012-12-13 to 2022-12-12here

The raw data are already available in the data folder.

Report

All the details about the choices that have been made and the methodology used throughout this project are available in report.pdf. Through this report, the reader is able to understand the different assumptions, decisions and results made during the project. The theoretical background is also explained.

Reproduce results

Requirements

  • python=3.10.8
  • pytorch=1.13.1
  • pandas=1.5.2
  • scikit-learn=1.1.3
  • tqdm=4.64.1
  • matplotlib=3.6.2
  • seaborn=0.12.1

Instructions to run

First make sure to have all the requirements and the data folder in the root.

The following commands give more details about the positional arguments and a description of the process done while running:

python process_data.py -h
python validation.py -h
python train.py -h
python test.py -h

Please run them before running the following. The commands showed bellow have to be executed in the same order to keep consistency.

The processed data can be reproduced from the raw data by moving to the src/ folder and execute:

python process_data.py dataset nb_lags train_ratio

To run the optimization on the validation set move to the src/ folder and execute:

python validation.py model_type dataset

Beware that optimizing one model type on one dataset takes from 1min to 8 min (depending on the model) on Google Colab with GPU availability.

To train the models with the best parameters found during the validation move to the src/ folder and execute:

python train.py model_type dataset

To test the performances of the trained models move to the src/ folder and execute:

python test.py model_type dataset

Results

Hit-Rate comparison for each asset

B&HNNCNNLSTMRFC1C2
Bitcoin0.4730.4570.4460.4790.4590.4510.505
Ethereum0.4900.4570.4900.4810.4750.4880.497
Ripple0.4900.4970.4970.5320.4700.5140.525
Nat. gas0.5160.5180.5190.4940.4980.5050.502
Gold0.4930.5020.5110.4770.5120.5090.514
Oil0.5620.4920.5080.5270.5220.5250.536
SP&5000.5640.5530.5540.5580.5480.5720.533
CAC400.5480.5200.5260.5280.4980.5040.520
SMI0.5350.5070.5110.5060.4880.5170.546

Sharpe ratio comparison for each asset

B&HNNCNNLSTMRFC1C2
Bitcoin-1.163-0.981-2.066-0.140-2.020-1.8560.413
Ethereum-0.870-0.560-0.3271.1230.2450.308-0.404
Ripple-0.947-0.1430.5211.191-1.4530.5311.497
Nat. gas0.7430.5231.107-0.3080.3520.5770.615
Gold0.034-0.056-0.960-0.4680.305-0.3900.596
Oil0.6860.3890.6520.9941.0120.8101.049
S&P5000.5180.8330.8010.7680.0901.1360.551
CAC400.597-0.1920.7020.499-0.6740.1690.685
SMI0.219-0.588-0.321-0.367-0.805-0.4000.522

perf

Authors

  1. Mery Tom, SCIPER: 297217 (tom.mery@epfl.ch)
  2. Lelièvre Maxime, SCIPER: 296777 (maxime.lelievre@epfl.ch)
  3. Peduto Matteo, SCIPER: 316194 (matteo.peduto@epfl.ch)