EQTransformer.core.trainer module¶

Created on Wed Apr 25 17:44:14 2018

@author: mostafamousavi last update: 05/27/2021

EQTransformer.core.trainer.trainer(input_hdf5=None, input_csv=None, output_name=None, input_dimention=(6000, 3), cnn_blocks=5, lstm_blocks=2, padding='same', activation='relu', drop_rate=0.1, shuffle=True, label_type='gaussian', normalization_mode='std', augmentation=True, add_event_r=0.6, shift_event_r=0.99, add_noise_r=0.3, drop_channel_r=0.5, add_gap_r=0.2, coda_ratio=0.4, scale_amplitude_r=None, pre_emphasis=False, loss_weights=[0.05, 0.4, 0.55], loss_types=['binary_crossentropy', 'binary_crossentropy', 'binary_crossentropy'], train_valid_test_split=[0.85, 0.05, 0.1], mode='generator', batch_size=200, epochs=200, monitor='val_loss', patience=12, gpuid=None, gpu_limit=None, use_multiprocessing=True)[source]¶

Generate a model and train it.

Parameters:

input_hdf5 (str, default=None) – Path to an hdf5 file containing only one class of data with NumPy arrays containing 3 component waveforms each 1 min long.
input_csv (str, default=None) – Path to a CSV file with one column (trace_name) listing the name of all datasets in the hdf5 file.
output_name (str, default=None) – Output directory.
input_dimention (tuple, default=(6000, 3)) – OLoss types for detection, P picking, and S picking respectively.
cnn_blocks (int, default=5) – The number of residual blocks of convolutional layers.
lstm_blocks (int, default=2) – The number of residual blocks of BiLSTM layers.
padding (str, default='same') – Padding type.
activation (str, default='relu') – Activation function used in the hidden layers.
drop_rate (float, default=0.1) – Dropout value.
shuffle (bool, default=True) – To shuffle the list prior to the training.
label_type (str, default='triangle') – Labeling type. ‘gaussian’, ‘triangle’, or ‘box’.
normalization_mode (str, default='std') – Mode of normalization for data preprocessing, ‘max’: maximum amplitude among three components, ‘std’, standard deviation.
augmentation (bool, default=True) – If True, data will be augmented simultaneously during the training.
add_event_r (float, default=0.6) – Rate of augmentation for adding a secondary event randomly into the empty part of a trace.
shift_event_r (float, default=0.99) – Rate of augmentation for randomly shifting the event within a trace.
add_noise_r (float, defaults=0.3) – Rate of augmentation for adding Gaussian noise with different SNR into a trace.
drop_channel_r (float, defaults=0.4) – Rate of augmentation for randomly dropping one of the channels.
add_gap_r (float, defaults=0.2) – Add an interval with zeros into the waveform representing filled gaps.
coda_ratio (float, defaults=0.4) – % of S-P time to extend event/coda envelope past S pick.
scale_amplitude_r (float, defaults=None) – Rate of augmentation for randomly scaling the trace.
pre_emphasis (bool, defaults=False) – If True, waveforms will be pre-emphasized. Defaults to False.
loss_weights (list, defaults=[0.03, 0.40, 0.58]) – Loss weights for detection, P picking, and S picking respectively.
loss_types (list, defaults=['binary_crossentropy', 'binary_crossentropy', 'binary_crossentropy']) – Loss types for detection, P picking, and S picking respectively.
train_valid_test_split (list, defaults=[0.85, 0.05, 0.10]) – Precentage of data split into the training, validation, and test sets respectively.
mode (str, defaults='generator') – Mode of running. ‘generator’, or ‘preload’.
batch_size (int, default=200) – Batch size.
epochs (int, default=200) – The number of epochs.
monitor (int, default='val_loss') – The measure used for monitoring.
patience (int, default=12) – The number of epochs without any improvement in the monitoring measure to automatically stop the training.
gpuid (int, default=None) – Id of GPU used for the prediction. If using CPU set to None.
gpu_limit (float, default=None) – Set the maximum percentage of memory usage for the GPU.
use_multiprocessing (bool, default=True) – If True, multiple CPUs will be used for the preprocessing of data even when GPU is used for the prediction.

Returns:

output_name/models/output_name_.h5 (This is where all good models will be saved.)
output_name/final_model.h5 (This is the full model for the last epoch.)
output_name/model_weights.h5 (These are the weights for the last model.)
output_name/history.npy (Training history.)
output_name/X_report.txt (A summary of the parameters used for prediction and performance.)
output_name/test.npy (A number list containing the trace names for the test set.)
output_name/X_learning_curve_f1.png (The learning curve of Fi-scores.)
output_name/X_learning_curve_loss.png (The learning curve of loss.)

Notes

‘generator’ mode is memory efficient and more suitable for machines with fast disks. ‘pre_load’ mode is faster but requires more memory and it comes with only box labeling.