Intro
Getting started
Project description
The aim of this project of Machine Learning is to predict if a decay signature is a Higgs Boson or some other particle.
The model is based on a vector of features of a collision event between two high speed protons. More detail about the project ara available in references/project1_description.pdf. Here a regularized logistic regression is implemented and trained on 8 sub-sets of the full dataset.
Data
The Dataset comes from a popular machine learning challenge recently - finding the Higgs boson - using original data from CERN. The dataset is available at https://www.aicrowd.com/challenges/epfl-machine-learning-higgs. To reproduce the results a folder data/ should be added to the repo, as described in Repo Architecture. A detailed description of the dataset is availabel in references/The_Higgs_boson_ML_challenge.pdf.
Report
All the detailed about the choices that has been made and the methodology used throughout this project are available in report.pdf. Through this report, the reader is able to understand the different assumptions, decisions and results made during the project
Reproduce results
Requirements
- Python==3.9.13
- Numpy==1.21.5
- Matplotlib
Instructions to run
Move to the root folder and execute:
python run.py
Make sure to have all the requirements and the data folder in the root. Be aware training the models on 1000 epochs takes around 5 min on Apple silicon M1 Pro. Here the best model has been trained over 15000 epochs.
If you want to run the cross-validation move to the root folder and execute:
python optimization.py
Here the cross-validation has taken around 1h for one sub-models (on Apple silicon M1 Pro), therefore around 8 hours for the whole model.
If you want to visualize the performances of the model during the training, move to the root folder and execute:
python plot_performance.py
Results
The performances of the models is assessed on AirCrowd from data/submission.csv generated by run.py. The model achieves a global accuracy of 0.818 with a F1-score of 0.722.
Here are he performance of each sub model during the training:

Authors
- Mery Tom, SCIPER: 297217 (tom.mery@epfl.ch)
- Lelièvre Maxime, SCIPER: 296777 (maxime.lelievre@epfl.ch)
- Peduto Matteo, SCIPER: 316194 (matteo.peduto@epfl.ch)