Here are contained the set of functions relating to the training, validation and testing of the neural networks.
If the user intends to load pickles of saved DeepLearning objects or model pth files it is important to remember that the models must be loaded in the same computational environment as they were initialised in. Both in terms of parallelisation and the processing units they are loaded on.
For example if a model was trained on 16 GPUs in parallel, it will be required that that model is loaded on 16 GPUs in parallel. This is a pre-requisite required by Pytorch in their serialization routines.
This module include a set of functions relating to the training, validation and testing of neural networks.
Author: Oliver Boom Github Alias: OliverJBoom
DeepLearning(model, data_X, data_y, optimiser, batch_size=128, n_epochs=100, loss_function=<sphinx.ext.autodoc.importer._MockObject object>, device='cpu', seed=42, debug=True, disp_freq=20, fig_disp_freq=50, early_stop=True, early_verbose=False, patience=50, rel_tol=0, scaler_data_X=None, scaler_data_y=None)¶
Class to perform training and validation for a given model
- model (nn.module) – The neural network model
- data_X (np.array) – The training dataset
- data_y (np.array) – the target dataset
- n_epochs (int) – The number of epochs of training
- optimiser (torch.optim) – The type of optimiser used
- batch_size (int) – The batch size
- loss_function (torch.nn.modules.loss) – The loss function used
- device (string) – The device to run on (Cpu or CUDA)
- seed (int) – The number that is set for the random seeds
- debug (bool) – Whether to print some parameters for checking
- disp_freq (int) – The epoch frequency that training/validation metrics will be printed on
- fig_disp_freq (int) – The frequency that training/validation prediction figures will be made
- early_stop (bool) – Whether early stopping is utilized
- early_verbose (bool) – Whether to print out the early stopping counter
- patience (stopping int) – The amount of epochs without improvement before
- rel_tol – The relative improvement percentage that must be achieved float
- scaler_data_X (sklearn.preprocessing.data.MinMaxScaler) – The data X scaler object for inverse scaling
- scaler_data_y (sklearn.preprocessing.data.MinMaxScaler) – The dataX y scaler object for inverse scaling
Forms iterators to pipeline in the data/labels
Evaluates the performance of the network on given data for a given model.
A lot of overlap of code with validation. Only kept separate due to the inspection of attributes being made easier when running simulations if kept separate.
- model (nn.module) – The model to evaluate
- test_loader (torch.utils.data.dataloader.DataLoader) – The iterator that feeds in the data of choice
The error metric for that dataset
Plots the training predictions, validation predictions and the training/validation losses as they are predicted.
Checks the size of the datasets
Performs a single training epoch and returns the loss metric for the training dataset.
Parameters: train_loader (torch.utils.data.dataloader.DataLoader) – The iterator that feeds in the training data Returns: The error metric for that epoch Return type: float
Splits the DataFrames in to a training, validation and test set and creates torch tensors from the underlying numpy arrays
The wrapper that performs the training and validation
Evaluates the performance of the network on unseen validation data.
Parameters: val_loader (torch.utils.data.dataloader.DataLoader) – the iterator that feeds in the validation data Returns: the error metric for that epoch Return type: float
EarlyStopping(patience, rel_tol, verbose=True)¶
Used to facilitate early stopping during the training of neural networks.
When called if the validation accuracy has not relative improved below a relative tolerance set by the user the a counter is incremented. If the counter passes a set value then the stop attribute is set to true. This should be used as a break condition in the training loop.
If rel_tol is set to 0 then the metric just needs to improve from it’s existing value
- patience (int) – The amount of epochs without improvement before stopping
- rel_tol (float) – The relative improvement % that must be achieved
- verbose (bool) – Whether to print the count number
- best (float) – The best score achieved so far
- counter (int) – The amount of epochs without improvement so far
- stop (bool) – Whether stopping criteria is achieved
full_save(model, model_name, optimiser, num_epoch, learning_rate, momentum, weight_decay, use_lg_returns, PCA_used, data_X, train_loss, val_loss, test_loss, train_time, hidden_dim, mse, mae, mde, path)¶
Saves the models run details and hyper-parameters to a csv file :param model: The model run :type model: nn.module
- model_name (strin) – The name the model is saved under
- optimiser (torch.optim) – The optimiser type used
- num_epoch (int) – The number of epochs run for
- learning_rate (float) – The learning rate learning hyper-parameter
- momentum (float) – The momentum learning hyper-parameter
- weight_decay (float) – The weight decay learning hyper-parameter
- use_lg_returns (bool) – Whether log returns was used
- PCA_used (bool) – Whether PCA was used
- data_X (np.array) – The training dataset (used to save the shape)
- train_loss (float) – The loss on the training dataset
- val_loss (float) – The loss on the validation dataset
- test_loss (float) – The loss on the test dataset
- train_time (float) – The amount of time to train
- hidden_dim (int) – The number of neurons in the hidden layers
- mse (floot) – The mean squared error metric
- mae (floot) – The mean absolute error metric
- mde (floot) – The mean direction error metric
- path (string) – The directory path to save in
model_load(model_name, device, path='../Results/Pths/')¶
Loading function for the models.
- model_name (string) – The model name to load
- device (string) – The device to run on (Cpu or CUDA)
- path (string) – The directory path to load the model from
model_save(model, name, path='../Results/Pths/')¶
Saving function for the model.
- model (torch.nn) – The model to save
- name (string) – The name to save the model under
- path (string) – The directory path to save the model in
Strips the key text info out of certain parameters. Used to save the text info of which models/optimiser objects are used
Parameters: param (object) – The parameter object to find the name of
Sets the random seeds to ensure deterministic behaviour.
Parameters: seed (int) – The number that is set for the random seeds Returns: Confirmation that seeds have been set Return type: bool