Feeding the model - VasSkliris/mly GitHub Wiki

Previously when were learning how generate datasets it was stated that different types of data are saved all together. Datasets are made and saved independently to give the user the ability to combine them or use just fractions of them. There are many tools to be used with the data and the networks witch are all in mly.mlTools directory.

Calling the data you want to train with

After you have your model ready for training you have prepare the data to train it. Fortunately for you data generator functions are already saving the data in a format that will save you the trouble of finding the right shape for them. All .mat files are a dictionary with the keys and shapes:

data_mat_file['data']   --> (N, lenght, NumberOfDetectors)
data_mat_file['labels]  --> (N, NumberOfLabels)

Every .mat file consists from data of the same label. To combine datasets you have to use the data_fusion function.

data_fusion(names     
            ,sizes=None
            ,save=False
            ,data_source_file = null_path+'/datasets/'):

names: The only mandatory input, witch has to be a list with the paths of the dataset files to be merged. The origin of the paths is the datasets directory (it can be changed by changing data_source_file).
sizes: If this is None it will just merge the datasets. If not it has to be a list o integers representing the sizes you want from each dataset. In that case the length of that list has to be the same with names.
save: False if you don't want to save the merged datasets and True in the rare occasions where you might want to save it.

Finally it returns two variables, one is the data and the other the labels. Those two variables can be the input of the training function as you will see below.

Training the model

During a training it is very important to save information about the performance of the network after every batch of training. Unfortunately Keras does not have an automatic way to save the history of the training in a model object. For that reason I made a function that does the training and also returns the history of that training.

hist = train_model(model              
               ,dataset         
               ,epoch           
               ,batch            
               ,split           
               ,classes=2          
               ,save_model=False  
               ,data_source_path= null_path+'/datasets/'
               ,model_source_path= null_path+'/trainings/')

In this function the model is trained in the background using the function model.fit(...) from Keras and returns a dictionary with the accuracy and loss of the validating and testing data. The parameters are:

model: is the the model object or a path to an already existing model that model_source_path should indicate.
dataset: is the a list object with two elements [ data, labels] witch can be provided from data_fucion function or it can be a path to an existing dataset that data_source_path should indicate.
epoch : is the epochs to train as it is called in Keras model.fit.
batch : is the batch size as it is called in Keras model.fit.
split : is the proportion of the training and testing data to use during training as it is called in Keras model.fit.
classes: is the number of classes used in the model.
save_mode: is indicator to if you want to save the model. If you want to save it you should give this variable a name for the saved model.

Another utility function that might be needed is save_history witch takes the history output of the trainings and presets it to the user to have a first idea about the performance of the model.

save_history(histories
             ,name='name_of_history'
             ,save=False
             ,extendend=True)

histories: is a list of dictionaries that are the outputs of training_model functions.
name : is the name of the complete history files of the model.
save : is True if you want to save the dictionary .pkl file of the complete history.
extended : is True if you want to save the plot of the performance in .png.

Example training script

Now that we have the main functions let's see an example of all of them.


from mly.mlTools import *
from mly.generators import data_fusion
from mly.models import conv_model_1D



CORE =     ['C','C','C','C','F','D','D','DR']
MAX_POOL = [ 2,  0,  0,  2,  0,  0,  0,  0  ]
FILTERS =  [ 8, 16, 32, 64,  0, 64, 32,  0.3]
K_SIZE =   [ 3,  3,  3,  3,  0,  0,  0,  0  ] 


in_shape = (8192,3)
lr = 0.0001



PM=[CORE,MAX_POOL,FILTERS,K_SIZE]

model = conv_model_1D(parameter_matrix = PM
                    , INPUT_SHAPE = in_shape 
                    , LR = lr
                    , verbose=True)



cbc_data   =[['cbc/cbc_real1/HLV_time_cbc_with_real_noise_SNR40_XM'],
             ['cbc/cbc_real1/HLV_time_cbc_with_real_noise_SNR20_XM']]

noise_data =[['noise/real/real1/HLV_time_real_noise_No1_XM'],
             ['noise/real/real1/HLV_time_real_noise_No1_XM']]
  
history=[]
   
history.append(
         train_model(
               model = model                   # Model or already saved model from directory
                      ,dataset = data_fusion([cbc_data[0],noise_data[0]],[500,500])              
                      ,epoch = 10              # Epochs of training
                      ,batch = 50              # Batch size of the training 
                      ,split = 0.1             # Split ratio of TEST / TRAINING data
                      ,save_model=False        # (optional) I you want to save the model, assign name
                      ,data_source_path='/home/albert.einstein_/MLy_Workbench/datasets/'
                      ,model_source_path='/home/albert.einstein_/MLy_Workbench/trainings/'))

history.append(
         train_model(model
                     , data_fusion([cbc_data[1],noise_data[1]],[500,500])
                     , 10
                     , 50
                     , 0.1
                     , save_model=True))        

  
save_history(histories = history
             ,name = 'history_final'
             ,save=True
             ,extendend=True)

After all this work you will have your model(s) created and saved in .h5 files and ready for testing. Testing the other half part of having a useful neural network. For more information about that go to False alarm test and Accuracy Test