Feeding the model - VasSkliris/mly GitHub Wiki
Previously when were learning how generate datasets it was stated that different types of data are saved all together. Datasets are made and saved independently to give the user the ability to combine them or use just fractions of them. There are many tools to be used with the data and the networks witch are all in mly.mlTools
directory.
Calling the data you want to train with
After you have your model ready for training you have prepare the data to train it. Fortunately for you data generator functions are already saving the data in a format that will save you the trouble of finding the right shape for them. All .mat
files are a dictionary with the keys and shapes:
data_mat_file['data'] --> (N, lenght, NumberOfDetectors)
data_mat_file['labels] --> (N, NumberOfLabels)
Every .mat
file consists from data of the same label. To combine datasets you have to use the data_fusion
function.
data_fusion(names
,sizes=None
,save=False
,data_source_file = null_path+'/datasets/'):
names
: The only mandatory input, witch has to be a list with the paths of the dataset files to be merged. The origin of the paths is the datasets directory (it can be changed by changingdata_source_file
).sizes
: If this isNone
it will just merge the datasets. If not it has to be a list o integers representing the sizes you want from each dataset. In that case the length of that list has to be the same withnames
.save
: False if you don't want to save the merged datasets and True in the rare occasions where you might want to save it.
Finally it returns two variables, one is the data and the other the labels. Those two variables can be the input of the training function as you will see below.
Training the model
During a training it is very important to save information about the performance of the network after every batch of training. Unfortunately Keras does not have an automatic way to save the history of the training in a model object. For that reason I made a function that does the training and also returns the history of that training.
hist = train_model(model
,dataset
,epoch
,batch
,split
,classes=2
,save_model=False
,data_source_path= null_path+'/datasets/'
,model_source_path= null_path+'/trainings/')
In this function the model is trained in the background using the function model.fit(...)
from Keras and returns a dictionary with the accuracy and loss of the validating and testing data. The parameters are:
model
: is the the model object or a path to an already existing model thatmodel_source_path
should indicate.dataset
: is the a list object with two elements [ data, labels] witch can be provided fromdata_fucion
function or it can be a path to an existing dataset thatdata_source_path
should indicate.epoch
: is the epochs to train as it is called in Kerasmodel.fit
.batch
: is the batch size as it is called in Kerasmodel.fit
.split
: is the proportion of the training and testing data to use during training as it is called in Kerasmodel.fit
.classes
: is the number of classes used in the model.save_mode
: is indicator to if you want to save the model. If you want to save it you should give this variable a name for the saved model.
Another utility function that might be needed is save_history
witch takes the history output of the trainings and presets it to the user to have a first idea about the performance of the model.
save_history(histories
,name='name_of_history'
,save=False
,extendend=True)
histories
: is a list of dictionaries that are the outputs oftraining_model
functions.name
: is the name of the complete history files of the model.save
: is True if you want to save the dictionary.pkl
file of the complete history.extended
: is True if you want to save the plot of the performance in.png
.
Example training script
Now that we have the main functions let's see an example of all of them.
from mly.mlTools import *
from mly.generators import data_fusion
from mly.models import conv_model_1D
CORE = ['C','C','C','C','F','D','D','DR']
MAX_POOL = [ 2, 0, 0, 2, 0, 0, 0, 0 ]
FILTERS = [ 8, 16, 32, 64, 0, 64, 32, 0.3]
K_SIZE = [ 3, 3, 3, 3, 0, 0, 0, 0 ]
in_shape = (8192,3)
lr = 0.0001
PM=[CORE,MAX_POOL,FILTERS,K_SIZE]
model = conv_model_1D(parameter_matrix = PM
, INPUT_SHAPE = in_shape
, LR = lr
, verbose=True)
cbc_data =[['cbc/cbc_real1/HLV_time_cbc_with_real_noise_SNR40_XM'],
['cbc/cbc_real1/HLV_time_cbc_with_real_noise_SNR20_XM']]
noise_data =[['noise/real/real1/HLV_time_real_noise_No1_XM'],
['noise/real/real1/HLV_time_real_noise_No1_XM']]
history=[]
history.append(
train_model(
model = model # Model or already saved model from directory
,dataset = data_fusion([cbc_data[0],noise_data[0]],[500,500])
,epoch = 10 # Epochs of training
,batch = 50 # Batch size of the training
,split = 0.1 # Split ratio of TEST / TRAINING data
,save_model=False # (optional) I you want to save the model, assign name
,data_source_path='/home/albert.einstein_/MLy_Workbench/datasets/'
,model_source_path='/home/albert.einstein_/MLy_Workbench/trainings/'))
history.append(
train_model(model
, data_fusion([cbc_data[1],noise_data[1]],[500,500])
, 10
, 50
, 0.1
, save_model=True))
save_history(histories = history
,name = 'history_final'
,save=True
,extendend=True)
After all this work you will have your model(s) created and saved in .h5
files and ready for testing. Testing the other half part of having a useful neural network. For more information about that go to False alarm test and Accuracy Test