User Guide - NNBlocks/NNBlocks GitHub Wiki

#NNBlocks User Guide

The NNBlocks framework and this guide are currently under heavy development.

This guide will try to cover all of NNBlock's architecture, implemented Models and how someone can extend the framework.

For suggestions about the framework, please open an issue.

###Table of Contents

  1. Introduction
  2. Installing
  3. Models
  4. The Model Class
  5. Connecting Models
  6. The CustomModel
  7. Models Inside Models
  8. Reading Models Documentation
  9. Trainers
  10. The TrainSupervisor
  11. Using the TrainSupervisor with Default Options
  12. Custom Procedures
  13. Neural Network Models
  14. The Standard Models
  15. The RecursiveNeuralNetwork
  16. The RecurrentNeuralNetwork
  17. Other Utilities

##Introduction

NNBlocks is a framework born from the need to build neural models for the task of semantic compositionality. The problem with other Deep Learning frameworks is that they are not flexible enough to build the newest linguistics neural models available. NNBlocks tries to take care of this problem by providing a very flexible set of "blocks" and the means to connect them in order to build any kind of neural network architecture.

NNBlocks also tries to implement the latest advances of Artificial Neural Networks research. This way these techniques are promptly available to the linguistics community, even if they are not already used in the area.

In theory NNBlocks can be used in any kind of task, but the development of the framework will always favor the linguistics tasks needs over others. Because of this, a lot of the "blocks" already implemented seem to be made to operate on word vectors. For example, all the 'insize' parameters of the Models refer to the length of the input word vectors.

##Installing NNBlocks has a setup.py script in its root directory. The easiest way to install NNBlocks is by having setuptoools installed and running the command:

sudo python setup.py install

If you don't have and don't want to install setuptools, you will have to put NNBlocks in your PYTHONPATH and install the dependencies manually. To use NNBlocks in Unix systems, for example, one can first install Theano and, after that, do the following:

cd ~
git clone https://github.com/NNBlocks/NNBlocks
cd NNBlocks
export PYTHONPATH=$PYTHONPATH:`pwd`

Now you can simply import nnb in python to use NNBlocks.

If you close the terminal you will have to export the PYTHONPATH variable again. To make the export permanent, append it to the .bashrc file or equivalent.

If you want to use NNBlock's plotting tools, you should also install matplotlib.

##Models The first thing to know about NNBlocks is how it does all of its computations. In NNBlocks we call all of the "blocks" Models. These Models are responsible for taking a certain number of inputs, applying some kind of computation and outputting a certain number of outputs. A Model can also have a certain number of tunable parameters that will be adjusted later via gradient methods.

How these Models are made and connected between each other will be presented in this section. ####The Model Class Now we will explain the nnb.Model class. If you just want to use the Models available, feel free to skip this subsection.

Every Model in NNBlocks extends the nnb.Model class and overrides some methods. The three main methods that should be overridden are: #####init_options() This method is responsible for declaring any wanted options for the Model, such as activation function, window size etc. This method should return a nnb.utils.Options instance filled with the wanted options. You will normally want to mark this method as @staticmethod as it is not dependent on a Model instance.

Example:

import nnb
class FooModel(nnb.Model):
    @staticmethod
    def init_options():
        opts = nnb.utils.Options()
        opts.add(
            name='insize',
            value_type=int,
            required=True
        )
        #activation_func is not a required option
        #The default value is nnb.activation.sigmoid
        opts.add(
            name='activation_func',
            value=nnb.activation.sigmoid
        )
        return opts

f = FooModel()
#ValueError! Should specify 'insize'
f = FooModel(insize=2.5)
#ValueError! 'insize' should be of type int
f = FooModel(insize=3)
#OK!

If you want more information on the nnb.utils.Options class, please read its class documentation. #####init_params(self) This second method concerns the tunable parameters of the Model. These tunable parameters are often weight matrices, but they can be anything.

Since NNBlocks is built on top of Theano, these tunable parameters are instances of Theano shared variables. Please take a look at their guide to see how to create such instances. Don't worry, it's quite easy!

The init_params(self) method should always return a list of Theano shared variables. These shared variables will be the tunable parameters of the Model. Be careful when declaring tunable parameters, because if they are not used inside the Model, NNBlocks won't be able to compile the network.

Example:

import nnb
import numpy as np
import theano
class BarModel(nnb.Model):
    @staticmethod
    def init_options():
        opts = nnb.utils.Options()
        opts.add(
            name='insize',
            value_type=int,
            required=True
        )
        opts.add(
            name='outsize',
            value_type=int,
            required=True
        )
        return opts

    def init_params(self):
        insize = self.options.get('insize')
        outsize = self.options.get('outsize')

        W = np.random.uniform(size=(insize, outsize))
        W = theano.shared(value=W, name='W')

        b = np.zeros(shape=(outsize,))
        b = theano.shared(value=b, name='b')

        return [W, b]

b = BarModel(insize=10, outsize=5)
print b.params
#[W, b]

As we can see in this example, the BarModel has two parameters, W, a matrix, and b, a vector. This example also introduces two things: How to access the filled nnb.utils.Options instance declared at the init_options() method and how to access a Model's tunable parameters.

This example is not practical yet, since BarModel doesn't do any computations. The next method will address this. #####apply(self, inputs) This method is responsible for actually doing any computations. Be aware that everything inside this method must be handled using Theano variables. Again, please refer to their guide on how to handle these variables. Normally this is done in the same way as dealing with numpy ndarrays, but you should be aware that you are always dealing with Theano variables, not ndarrays.

The inputs parameter of this method will always be a list of Theano variables. These variables are the Model's inputs. One should always check the number and ndim property of these inputs and raise an appropriate Error if they don't conform to the expected input. We check only the ndim property because we are not able to determine the complete shape of the input when this method is called.

The apply(self, inputs) method should always return a list of Theano variables. These variables are the wanted computation for the Model. This may sound confusing if you are not familiar with Theano, but the example will show that this is not confusing at all.

Example:

import nnb
import numpy as np
import theano
class SimplePerceptronLayer(nnb.Model):
    @staticmethod
    def init_options():
        opts = nnb.utils.Options()
        opts.add(
            name='insize',
            value_type=int,
            required=True
        )
        opts.add(
            name='outsize',
            value_type=int,
            required=True
        )
        return opts

    def init_params(self):
        insize = self.options.get('insize')
        outsize = self.options.get('outsize')

        W = np.random.uniform(size=(insize, outsize))
        W = theano.shared(value=W, name='W')

        b = np.zeros(shape=(outsize,))
        b = theano.shared(value=b, name='b')

        return [W, b]

    def apply(self, inputs):
        if len(inputs) != 1 or inputs[0].ndim != 2:
            raise ValueError("Invalid inputs: {0}".format(inputs))
        W = self.params[0]
        b = self.params[1]
        x = inputs[0]

        return [nnb.activation.sigmoid(x.dot(W) + b)]

This example shows how the three main methods of the nnb.Model class play a part in implementing a simple Perceptron layer of a Neural Network. This Model will accept a single matrix as input and output a single matrix. Keep reading to know how this Model can be used.

Connecting Models

Single models are of little use if there is no way to connect them. This section will explain how one can use Horizontal Joins and Vertical Joins to declare a fully functional Model.

Horizontal Joins

The simplest join is the Horizontal Join, where it simply takes all outputs of a Model and forwards it to the input of another. One can declare a Horizontal Join using the nnb.HorizontalJoinModel, but NNBlocks features a simpler way to do this, with the bitwise OR operator (|).

You can visualize a Horizontal Join like this:

Horizontal Join

Lets use the previously declared SimplePerceptronLayer to build an useful Model:

inp = nnb.InputLayer(ndim=2, name='network_inputs')
l1 = SimplePerceptronLayer(insize=5, outsize=50)
l2 = SimplePerceptronLayer(insize=50, outsize=30)
l3 = SimplePerceptronLayer(insize=30, outsize=5)

neuralnet = inp | l1 | l2 | l3
feedforward_func = neuralnet.compile()
print feedforward_func([0.1, 0.8, 0.2, 0.6, 0.3])

Now we are getting somewhere. We used 4 neural network layers of sizes 5 (the input layer), 50, 30 and 5. The input layer is crucial here. Every Model needs an user input to be the input of the first block, and the way this user input is generated is with the nnb.InputLayer Model.

In this example we can also see how to compile a model, using the compile() method of the nnb.Model class. This method will collect all of the Models' inputs and pass them along the Models graph. If you are familiar with Theano and would like to handle the input/output variables directly, use the get_io() method.

We will soon cover how to train these Models' tunable parameters (weight matrices, in this case), but there are a couple of things you should read first, so try not to skip the whole guide.

Vertical Joins

The second way to join Models is via the nnb.VerticalJoinModel. This join, in its simpler use case, will take a Model's outputs and forward them to two Models inputs concurrently. After both Models had their apply(self, inputs) method called for the same inputs, their outputs are combined for the next Model's input. For example, if the first Model in the Vertical Join has 3 outputs and the second Model has 2 outputs, the next Model receiving their outputs will be getting an inputs list parameter of length 5.

Visually, one can understand the Vertical Join like this:

Vertical Model

Vertical Joins are more flexible than the last figure shows. They can be used to, for example, insert a new user input in the middle of the model:

Vertical Model 2

Just like the Horizontal Join, the Vertical Join does not need to be created using the nnb.VerticalJoinModel. One can simply use the bitwise AND operator (&).

In this next example we will show two neural networks (created with the previously declared SimplePerceptronLayer), each with its own input layer, being stacked with a third neural network.

inp1 = nnb.InputLayer(ndim=2, name="input_first_nn")
l11 = SimplePerceptronLayer(insize=5, outsize=50)
l12 = SimplePerceptronLayer(insize=50, outsize=30)
l13 = SimplePerceptronLayer(insize=30, outsize=5)

inp2 = nnb.InputLayer(ndim=2, name="input_second_nn")
l21 = SimplePerceptronLayer(insize=3, outsize=30)
l22 = SimplePerceptronLayer(insize=30, outsize=40)
l23 = SimplePerceptronLayer(insize=40, outsize=5)

nn1 = inp1 | l11 | l12 | l13
nn2 = inp2 | l21 | l22 | l23
nn3 = SimplePerceptronLayer(insize=10, outsize=5)

concat_model = nnb.ConcatenationModel(axis=1)

final_model = (nn1 & nn2) | concat_model | nn3
feedforward_func = final_model.compile()
feedforward([0.1, 0.8, 0.2, 0.6, 0.3], [0.5, 0.2, 0.6])

This is a more extensive example, with 3 neural networks, two with 3 layers and one with a single layer. Take notice on the nnb.ConcatenationModel showed here. This is one of the useful Models provided by NNBlocks. This Model just Concatenates its inputs together on the specified axis, in this case axis 1.

This example starts to show what we mean by saying that NNBlocks is flexible, but we are not quite done with showing the full flexibility of NNBlocks. The two next subsections will continue to present this.

The CustomModel

When starting to code NNBlocks to be able to handle compositionality models, we realized that it was impossible to cover every single technique in the compositionality literature without doing a simple ad hoc framework. So the first big decision of the project was "Let's simply not implement all the models, but make it possible to implement them easily". The nnb.CustomModel is the simplest way of implementing your own Model, without even needing to extend the nnb.Model class.

Although easy to use, you will quickly realize that the Models created with the nnb.CustomModel are not good to be reused in various parts of a project. If you need a Model that is reusable, we recommend you to implement your own Model extending the nnb.Model class.

The nnb.CustomModel has two initialization parameters. The first, the params parameter, is simply a list of numpy ndarrays that you would like to be tunable parameters. The second, the fn parameter, is a function that will do the computation of the outputs, given the inputs and the tunable parameters. Imagine the fn function as being the equivalent of the apply(self, inputs) method, but with slightly different parameters.

The parameters passed to the fn function will be, respectively, the Model's inputs (one function parameter for each input) and the tunable parameters (again, one function parameter for each tunable parameter). Unlike the apply(self, inputs) method, the fn function is not required to return a list of outputs. It can return a single output, a tuple of outputs or, again, a list of outputs. The following example will make all of this clear.

inp1 = nnb.InputLayer(ndim=1, name="vector_input")
inp2 = nnb.InputLayer(ndim=0, name="scalar_input")

vector_param = np.ones(shape=(5,))
scalar_param = np.asarray(2.)
tunable_params = [vector_param, scalar_param]

def custom_func(vector_input, scalar_input, vector_param, scalar_param):
    return vector_input.dot(vector_param) * (scalar_input / scalar_param)

custom_model = nnb.CustomModel(params=tunable_params, fn=custom_func)

final_model = (inp1 & inp2) | custom_model
feedforward_func = final_model.compile()
feedforward_func([2, 1, 5, 2, 6], 2.5)

This Model, which does the simple computation of , have 2 inputs (denoted by x's) and 2 tunable parameters (denoted by lambdas). This computation might be useless, but the concept of implementing an arbitrary computation that easily is the important part. You are never constrained by the implemented Models of NNBlocks, they are just there to save you some time.

Thanks to Theano, without a single extra line of code, your Custom Models will be automatically differentiated so the tunable parameters can be adjusted. They can even run on GPU just the way we wrote them in the example.

Models Inside Models

Another great thing that drives the flexibility of NNBlocks is the ability to use auxiliary Models inside your Model. That way, you can have abstract Models with flexible behaviour.

To use an auxiliary Model, you need to do 3 things:

  1. Make the auxiliary Model part of your Model's options. This is the best way to parametrize your Model's behaviour.

        @staticmethod
        def init_options():
            opts = nnb.utils.Options()
    
            #Insert your Model's options here
    
            opts.add(
                name='model',
                value_type=nnb.Model,
                required=True
            )
            return opts
    
  2. Make the auxiliary Model's tunable parameters part of your Model's param list. NNBlocks can't reach the auxiliary Model's tunable parameters, so you will have to adopt them as part of your Model.

        def init_params(self):
            my_params = []
    
            #Insert your Model's tunable parameters here
    
            model = self.options.get('model')
            my_params += model.params
    
            return my_params
    
  3. Detect the auxiliary Model's user inputs. Sometimes the auxiliary Model have its own user inputs (by having InputLayers in it). These user inputs are also not reachable by NNBlocks, so you should adopt them too. This can be done by overriding the _get_inputs(self) method like so:

        def _get_inputs(self):
            model = self.options.get('model')
            return model._get_inputs()
    

With these steps you can now simply use your own auxiliary Model! Be careful to always wrap the Model's inputs in a list and remember the Model's outputs are also in a list.

    def apply(self, inputs):
        x = inputs[0]

        #Insert your Model's logic here

        model = self.options.get('model')
        y = model.apply([x])[0]

        #Continue your logic here

        return [y]

The full awesomeness of auxiliary Models is seen when you are working with things like Recursive/Recurrent Neural Networks, where the structure of the computations are always the same, but the way they are done is always different, with things like LSTM, Recursive Neural Tensor Networks, Matrix-Vector RNN etc.

Reading Models Documentation

When building a Model with NNBlocks a nice thing you should know is how to read a Model's documentation. A Model's way of processing its inputs is not described in the apply(self, inputs) method docstring, just as the Model's tunable parameters are not described in the init_params(self) method. Instead, we decided to put all of a Model's information in the class docstring. Here is a demonstration on how this docstring is written:

class FooModel(nnb.Model):
    """Few words about what the Model is
    Longer description of the Model and some observations. After this
    description, the first thing to write is the Model's initialization
    parameters. After that the inputs, outputs and tunable parameters are
    described.

    :param param1: Optional int. A toy example parameter and what it does. If
        not set, the default value is 42.
    :param param2: Required int or list of ints. Another toy example parameter.
    :param model: Required Model. Auxiliary Model for the toy example.

    Inputs:
        A description about the Model's inputs. The description must contain
            the expected number of inputs, their shape description and any other
            constraints applied.

    Outputs:
        Just like the Inputs section, this section contains all information
            about the Model's outputs.

    Tunable Parameters:
        Here is described the tunable parameters in the Model. Normally this
            will be an ordered list with the parameters' names. Their order must
            be the same as the parameters are found in the `params` list of the
            Model.
    """

    #rest of the Model's implementation...

As an example of this format, here is the nnb.PerceptronLayer's documentation:

class PerceptronLayer(Model):
    """A Perceptron layer
    This Model implements a Perceptron Layer for Neural Networks. The horizontal
    joining of several of these layers forms a Multilayer Perceptron.
    This Model will take a single input x and output
    activation_func(x.dot(W) + b).

    :param insize: Required int. The input size. If the input of this model is
        a vector, insize will be the vector's length. If the input is a matrix,
        insize will be the length of each row.
    :param outsize: Required int. The output size. This can be thought as the
        layer's number of neurons. If the input is a vector, outsize is the
        length of the output vector. If the input is a matrix, outsize is the
        length of each row of the output matrix.
    :param activation_func: Optional callable object. This is the activation
        function used in the weighted average of the input vector. This function
        should use only theano operations. Default is nnb.activation.sigmoid
    :param W: Optional numpy ndarray with ndim=2. If set, the weights of this
        layer are not randomly initialized. Instead they are set to this
        parameter's value.
    :param b: Optional numpy ndarray with ndim=1. If set, the bias vector of
        this layer is not initialized with zeros. Instead it is set to this
        parameter's value.

    Inputs:
        A single input x with x.ndim=1 or x.ndim=2. x.shape[-1] should be equal
            to the insize parameter

    Outputs:
        A single output y with y.ndim=x.ndim and y.shape[-1] equal to the
            outsize parameter.

    Tunable Parameters:
        W - Weight matrix
        b - Bias vector
    """

    #rest of the Model's implementation...

One thing to be noticed here is that we are not afraid to write some numpy notation in the middle of the documentation, like x.shape[-1]. This is easiest way to describe the inputs/outputs shape and any other way would get confusing or ambiguous.

##Trainers The reason why we are building our Models is because we want to be able to train their tunable parameters for some given task. The way NNBlocks do this is with Trainers.

What Trainers do, basically, is minimize a Model's single output, given some set of inputs. So the first thing to be noticed here is that the Model being trained needs to have a single output. The second thing to notice is that a Trainer doesn't know what a cost function is. Instead it just tries to minimize any given function.

Let's take a look at an example:

nn_inp = nnb.InputLayer(ndim=2)
l1 = nnb.PerceptronLayer(insize=10, outsize=50)
l2 = nnb.PerceptronLayer(insize=50, outsize=20)
sm = nnb.SoftmaxLayer(insize=20, outsize=5)

network = nn_inp | l1 | l2 | sm

expected_output = nnb.InputLayer(ndim=1, dtype='int32')
cost_func = nnb.cost.NegativeLogLikelihoodError()

network_cost = (network & expected_output) | cost_func

trainer = nnb.train.AdagradTrainer(model=network_cost, learning_rate=0.1)

example1_inp = np.asarray(
    [
        [0.1, 0.4, 0.2, 0.6, 0.2, 0.5, 0.2, 0.7, 0.1, 0.6],
        [0.6, 0.8, 0.1, 0.8, 0.1, 0.9, 0.9, 0.6, 0.3, 0.7],
        [0.2, 0.3, 0.7, 0.2, 0.8, 0.2, 0.3, 0.4, 0.6, 0.4],
        [0.7, 0.7, 0.3, 0.8, 0.3, 0.5, 0.1, 0.2, 0.3, 0.9]
    ]
)
example1_expected = np.asarray(
    [0, 2, 4, 2]
)
example1 = [example1_inp, example1_expected]
example2_inp = np.asarray(
    [
        [0.5, 0.7, 0.1, 0.6, 0.2, 0.1, 0.8, 0.2, 0.7, 0.6],
        [0.2, 0.4, 0.6, 0.2, 0.3, 0.3, 0.2, 0.6, 0.3, 0.2],
        [0.1, 0.2, 0.2, 0.7, 0.8, 0.2, 0.5, 0.5, 0.3, 0.7]
    ]
)
example2_expected = np.asarray(
    [2, 2, 3]
)
example2 = [example2_inp, example2_expected]

trainer.train([example1, example2])

In this example, we build a 3-layer neural network with a softmax function at the end. This neural network is prepared to receive input vectors of length 10 and it will output probabilities for 5 classes. After building our Model we would like to minimize a negative log likelihood cost function, so we append it to our Model and pass it to the Trainer.

Now we are ready to use the train(self, inputs) method of the Trainer. This method will take a list of the Model's inputs and adjust the tunable parameters once. In the example we create a stub dataset and pass it to the Trainer, which will do a single batch training.

To implement a simple mini-batch training with a Trainer we can write:

dataset = get_dataset()
trainer = get_trainer()
epoch_num = 0
BATCH_SIZE = 20

while continue_training():
    print "EPOCH {0}".format(epoch)
    for i in range(len(dataset) / BATCH_SIZE):
        start_index = i * BATCH_SIZE
        end_index = (i + 1) * BATCH_SIZE

        trainer.train(dataset[start_index:end_index])

Using Trainers anyone can train a Model, but NNBlocks still provides a set of training tools. These tools are contained in the nnb.train.TrainSupervisor. The next section will discuss these tools.

Trainers also have a built in capacity to add L2 and L1 regularization. This can be done by passing a dictionary to the L2_reg and L1_reg parameters.

nn_inp = nnb.InputLayer(ndim=2)
l1 = nnb.PerceptronLayer(insize=10, outsize=50)
l2 = nnb.PerceptronLayer(insize=50, outsize=20)
sm = nnb.SoftmaxLayer(insize=20, outsize=5)

network = nn_inp | l1 | l2 | sm

expected_output = nnb.InputLayer(ndim=1, dtype='int32')
cost_func = nnb.cost.NegativeLogLikelihoodError()

network_cost = (network & expected_output) | cost_func

L2 = {
    tuple(l1.params): 1e-3,
    tuple(l2.params): 1e-4,
    tuple(sm.params): 1e-3
}

trainer = nnb.train.AdagradTrainer(model=network_cost, learning_rate=0.1,
                                    L2_reg=L2)

##The TrainSupervisor

The TrainSupervisor is still very new, so please leave some suggestions/opinions about it in our Issue tracker. Thank you!

When writing the code to train a Model, one will often write functions to do mini-batch training, plotting, evaluating, keeping the best tunable parameters etc. The nnb.train.TrainSupervisor class tries to encapsulate all of these procedures. This way, if properly set, a Model can be trained with a single method call.

The TrainSupervisor takes a Trainer and an evaluation Model. The trainer is used to adjust the tunable parameters and the evaluation Model is used in a given evaluation dataset. This evaluation can be used to choose the best parameters, plot cost curves, early stopping and, with the help of custom procedures, anything you want. Here is an example on how to use the TrainSupervisor:

import theano.tensor as T
import nnb

dataset = get_dataset()
datasplit = int(len(dataset) * 0.8)
train_dataset = dataset[:datasplit]
eval_dataset = dataset[datasplit:]

network = get_model()
expected_out = nnb.InputLayer(ndim=1, dtype='int32')
cost_func = nnb.cost.NegativeLogLikelihoodError()
cost = (network & expected_out) | cost_func

trainer = nnb.train.SGDTrainer(model=cost)

def errors(p_y, expected_y):
    predicted_y = T.argmax(p_y, axis=1)
    return 1 - T.mean(T.eq(predicted_y, expected_y))

eval_func = nnb.CustomModel(fn=errors)
eval_model = (network & expected_out) | eval_func

train_sup = nnb.train.TrainSupervisor(trainer=trainer,
    dataset=train_dataset, eval_model=eval_model,
    eval_dataset=eval_dataset, eval_interval=1,
    eval_model_is_cost=True, max_no_improve=10, epochs_num=200,
    permute_train=True, batch_size=20, plot=True)

train_sup.train()

Here we are trying to train a model for a classification task. In this example we can see a lot of parameters for the TrainSupervisor:

  • trainer -- The trainer the TrainSupervisor will use
  • dataset -- The dataset that will be iterated in the batch training, much like the example of the previous section
  • eval_model -- Model used to evaluate the tunable parameters. If this is not set, the Model passed to the Trainer will be used as an evaluation Model and the eval_model_is_cost is set to True
  • eval_dataset -- When evaluating, the TrainSupervisor will use the eval_model in this dataset. If this parameter is not set, the TrainSupervisor will make a copy of the dataset parameter and use it to evaluate
  • eval_interval -- This integer will tell how many epochs will go by before evaluating. Default value is 1
  • eval_model_is_cost -- This parameter lets the TrainSupervisor know tha the evaluation Model is a cost function. This cost function doesn't need to be exactly the same as the cost used for the trainer. When this parameter is True, the TrainSupervisor can keep the best tunable parameters based on the evaluation Model, do early stopping and plot the evaluation cost while training. If you specify this parameter, you lose some flexibility in your training, as the TrainSupervisor can't use external scripts for evaluation or can't use a more sophisticated approach on how to choose the best parameters.
  • max_no_improve -- When eval_model_is_cost is set to True, the TrainSupervisor will use this parameter to implement early stopping. This parameter tell how many epochs can go by without an improvement in the evaluation Model
  • epochs_num -- The maximum number of epochs to train. This is optional
  • permute_train -- If True the dataset will be shuffled every epoch. Default is True
  • batch_size -- The mini-batch size. If not specified, batch_size = len(dataset)
  • plot -- If eval_model_is_cost is True, the TrainSupervisor will use matplotlib to plot a line where the x axis is the number of epochs and the y axis is the mean cost using the evaluation Model in the evaluation dataset

The TrainSupervisor's behavior can be further specified by using the custom procedures, as we shall see.

####Custom Procedures The custom_procedures parameter of the TrainSupervisor is a list of functions, each with a single parameter. These functions will be called with a TrainingDescriptor object every epoch. For information about the TrainDescriptor class, take a look in its documentation in the NNBlocks/nnb/train/train_supervisor.py file.

Let's say we have an external function to evaluate a Model and we want to early stop the training as soon as this evaluation metric hits a certain threshold. This next example illustrates this situation:

dataset = get_dataset()
datasplit = int(len(dataset) * 0.8)
train_dataset = dataset[:datasplit]
eval_dataset = dataset[datasplit:]

network = get_model()
expected_out = nnb.InputLayer(ndim=1, dtype='int32')
cost_func = nnb.cost.NegativeLogLikelihoodError()
cost = (network & expected_out) | cost_func

trainer = nnb.train.SGDTrainer(model=cost)

def custom_eval(descriptor):
    metric = external_eval_func(descriptor.last_eval_results)
    if metric <= THRESHOLD:
        raise nnb.train.StopTraining()

train_sup = nnb.train.TrainSupervisor(trainer=trainer, eval_model=network,
    custom_procedures=[custom_eval], dataset=train_dataset,
    eval_dataset=eval_dataset, epochs_num=200, batch_size=20)

train_sup.train()

Now, using the nnb.train.StopTraining, our training will early stop. If we wanted to execute this custom procedure every 5 epochs instead of every epoch, we can replace custom_procedures=[custom_eval] with custom_procedures=[(5, custom_eval)].

Using this feature, you can train your Models in any way wanted. Just set up some custom procedures to plot, save parameters, early stop etc. and you are good to go.

##Neural Network Models ####The Standard Models ####The RecursiveNeuralNetwork ####The RecurrentNeuralNetwork ##Other Utilities