Intro
Getting started
Project description
Over the last few years, the cryptocurrencies became in-creasingly popular as an investment product and for a portfolio diversification strategy. A still increasing body of literature focused on the pertinence of the efficient market hypothesis (EMH). In essence, the EMH postulates that efficient markets reflect all past, public or public and private information in market prices. Verification of the EMH is important for market participants as it implies that such information cannot be used to make persistent profits on trading on the market.
In this context, we propose, in the continuation of Wildi et al.(2019), to extend their approachs to other cryptocurrencies and to commodities and market indices to see weather positive trading performances can be achieved with forecasting using machine learning models.
We here implement the following four machine learning methods to forecast the log-returns of several assets:
- Feedforward Neural Network (NN)
- Convolutional Neural Network (CNN)
- Long-Short-Term-Memory (LSTM)
- Random Forest
We further analyze weather the combination of several models trained on one asset (combination 1) gives better results than the results of the best model only and weather the combination of the best model trained on different assets of the same type (combination 2) gives better results than the results of the best model trained on only one asset of this type. Therefore we test their performance with several trading performance metrics.
Data
The collection of data has been performed on different web sites herafter detailed with the assets’ symbol used in the code:
Asset type | Asset name | Symbol | Periods | Link |
---|---|---|---|---|
Crypto-currency | Bitcoin | BTC-USD | 2017-11-09 to 2022-12-13 | here |
Crypto-currency | Ether | ETH-USD | 2017-11-09 to 2022-12-13 | here |
Crypto-currency | Ripple | XRP-USD | 2017-11-09 to 2022-12-13 | here |
Commodity | Gold | LBMA-GOLD | 2012-12-13 to 2022-12-09 | here |
Commodity | Natural Gas | NYMEX-NG | 2012-12-13 to 2022-12-09 | here |
Commodity | Oil | OPEC-ORB | 2012-12-13 to 2022-12-09 | here |
Stock market index | S&P500 | SP500 | 2012-12-13 to 2022-12-12 | here |
Stock market index | SMI | SMI | 2012-12-13 to 2022-12-12 | here |
Stock market index | CAC40 | CAC40 | 2012-12-13 to 2022-12-12 | here |
The raw data are already available in the data folder.
Report
All the details about the choices that have been made and the methodology used throughout this project are available in report.pdf
. Through this report, the reader is able to understand the different assumptions, decisions and results made during the project. The theoretical background is also explained.
Reproduce results
Requirements
- python=3.10.8
- pytorch=1.13.1
- pandas=1.5.2
- scikit-learn=1.1.3
- tqdm=4.64.1
- matplotlib=3.6.2
- seaborn=0.12.1
Instructions to run
First make sure to have all the requirements and the data folder in the root.
The following commands give more details about the positional arguments and a description of the process done while running:
python process_data.py -h
python validation.py -h
python train.py -h
python test.py -h
Please run them before running the following. The commands showed bellow have to be executed in the same order to keep consistency.
The processed data can be reproduced from the raw data by moving to the src/
folder and execute:
python process_data.py dataset nb_lags train_ratio
To run the optimization on the validation set move to the src/
folder and execute:
python validation.py model_type dataset
Beware that optimizing one model type on one dataset takes from 1min to 8 min (depending on the model) on Google Colab with GPU availability.
To train the models with the best parameters found during the validation move to the src/
folder and execute:
python train.py model_type dataset
To test the performances of the trained models move to the src/
folder and execute:
python test.py model_type dataset
Results
Hit-Rate comparison for each asset
B&H | NN | CNN | LSTM | RF | C1 | C2 | |
---|---|---|---|---|---|---|---|
Bitcoin | 0.473 | 0.457 | 0.446 | 0.479 | 0.459 | 0.451 | 0.505 |
Ethereum | 0.490 | 0.457 | 0.490 | 0.481 | 0.475 | 0.488 | 0.497 |
Ripple | 0.490 | 0.497 | 0.497 | 0.532 | 0.470 | 0.514 | 0.525 |
Nat. gas | 0.516 | 0.518 | 0.519 | 0.494 | 0.498 | 0.505 | 0.502 |
Gold | 0.493 | 0.502 | 0.511 | 0.477 | 0.512 | 0.509 | 0.514 |
Oil | 0.562 | 0.492 | 0.508 | 0.527 | 0.522 | 0.525 | 0.536 |
SP&500 | 0.564 | 0.553 | 0.554 | 0.558 | 0.548 | 0.572 | 0.533 |
CAC40 | 0.548 | 0.520 | 0.526 | 0.528 | 0.498 | 0.504 | 0.520 |
SMI | 0.535 | 0.507 | 0.511 | 0.506 | 0.488 | 0.517 | 0.546 |
Sharpe ratio comparison for each asset
B&H | NN | CNN | LSTM | RF | C1 | C2 | |
---|---|---|---|---|---|---|---|
Bitcoin | -1.163 | -0.981 | -2.066 | -0.140 | -2.020 | -1.856 | 0.413 |
Ethereum | -0.870 | -0.560 | -0.327 | 1.123 | 0.245 | 0.308 | -0.404 |
Ripple | -0.947 | -0.143 | 0.521 | 1.191 | -1.453 | 0.531 | 1.497 |
Nat. gas | 0.743 | 0.523 | 1.107 | -0.308 | 0.352 | 0.577 | 0.615 |
Gold | 0.034 | -0.056 | -0.960 | -0.468 | 0.305 | -0.390 | 0.596 |
Oil | 0.686 | 0.389 | 0.652 | 0.994 | 1.012 | 0.810 | 1.049 |
S&P500 | 0.518 | 0.833 | 0.801 | 0.768 | 0.090 | 1.136 | 0.551 |
CAC40 | 0.597 | -0.192 | 0.702 | 0.499 | -0.674 | 0.169 | 0.685 |
SMI | 0.219 | -0.588 | -0.321 | -0.367 | -0.805 | -0.400 | 0.522 |
Authors
- Mery Tom, SCIPER: 297217 (tom.mery@epfl.ch)
- Lelièvre Maxime, SCIPER: 296777 (maxime.lelievre@epfl.ch)
- Peduto Matteo, SCIPER: 316194 (matteo.peduto@epfl.ch)