4. Torch - alexattia/myWiki GitHub Wiki

Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.

Table of contents :

  1. Syntax

  2. Tensors

  3. Neural Network

    A. Basic Functions

    B. Load and normalize

    C. Load natural image

    D. Training the network

  4. Neural Network examples

Syntax

Object Method

If z is a torch object, we can apply its methods by using : instead of the .in python. For example : z:size().

Matrix, arrays, tensors

Indexing is beginning by 1. To access to an element of a 3D tensor, we use x[3][4][5].

The entire set of a 3D array is x[{ {}, {}, {}]. In order to pick a range of elements for one dimension (from i1 to i2 include), we can do x[{ {i1, i2}, {}, {} }].

Numpy users

Types

Numpy Torch
np.ndarray torch.Tensor
np.float32 torch.FloatTensor
np.float64 torch.DoubleTensor
np.uint8 torch.ByteTensor

Constructors

Numpy Torch
np.empty([2,2]) torch.Tensor(2,2)
np.eye torch.eye
np.ones torch.ones
np.array([ [1,2],[3,4] ]) torch.Tensor({{1,2},{3,4}})
np.ascontiguousarray(x) x:contiguous()
np.copy(x) x:clone()

Numerical Ranges

Numpy Torch
np.arange(10) torch.linspace(0,9,10)
np.arange(2, 3, 0.1) torch.linspace(2, 2.9, 10)

Attributes and Methods

Numpy Torch
x.shape x:size()
x.ndim x:dim()
x.data x:data()
x.size x:nElement()
x.size == y.size x:isSameSizeAs(y)
x.reshape x:reshape
x.transpose x:transpose()

Item selection and manipulation

Numpy Torch
np.take(a, indices) a[indices]
x[:,0] x[{{},1}]
np.sort sorted, indices = torch.sort(x, [dim])

Tensors

Tensoris a Torch, and probably the most important because it's handling numeric data. Tensors are serializable (that means they can be translated into a format that can be stored). A tensor is kind of a multi-dimensionality matrix. They are several types of tensors (byte, int, float, double ...). To define a Tensor :

 --- creation of a 4D-tensor 4x5x6x2
 z = torch.Tensor(4,5,6,2)
 --- for more dimensions, (here a 6D tensor) one can do:
 s = torch.LongStorage(6)
 s[1] = 4; s[2] = 5; s[3] = 6; s[4] = 2; s[5] = 7; s[6] = 3;
 x = torch.Tensor(s) ---[torch.DoubleTensor of size 4x5x6x2x7x3]

One could say that a Tensor is a particular way of viewing a Storage: a Storage only represents a chunk of memory, while the Tensor interprets this chunk of memory as having dimensions. Storages are basically a way for Lua to access memory of a C pointer or array. Storages can also map the contents of a file to memory. A Storage is an array of basic C types. For arrays of Torch objects, use the Lua tables. x:storage()returns the Storage used to store all the elements of the Tensor x. It returns all of the data contains into x.

Constructor

torch.Tensor()returns an empty tensor. torch.Tensor(sizes, [strides])creates a tensor of any number of dimensions.

torch.Tensor(n) : vector size n

Functions

z:fill(n)fill the data by n.

z:t()transposes z.

z:nDimension()returns the number of dimensions.

z:size()returns the size.

z:stride(n) Returns the jump necessary to go from one element to the next one in the specified dimension n. Note also that in Torch elements in the same row [elements along the last dimension] are contiguous in memory for a matrix [tensor]

x:isContiguous() boolean if the elements of the Tensor are contiguous in memory.

dst = src:type('torch.TypeTensor')to convert data type, shortcuts are available eg : trainData.data = trainData.data:float() to convert to Float.

Tensors can be moved onto GPU using the :cuda() function with require cutorch

Neural Network

Neural networks in Torch can be constructed using the nn package : require nn.

A. Basic Functions

B. Load and normalize

C. Load natural image

D. Training the network

Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create complex neural networks.

A. Basic functions

model = nn.Sequential() : Make a cascade module

model:add(nn.Reshape(ninputs)) : Reshape input to vector

model:add(nn.Linear(ninputs,nhiddens)) : Add Linear module with ninputs inputs and nhiddens hidden units

model:add(nn.Tanh()) : Add tanh module

net:add(nn.ReLU()): add activate function ReLU (Rectifier : f(x) = max(0, x))

criterion = nn.ClassNLLCriterion() Create loss function module, a negative log-likelihood criterion for multi-class classification

criterion = nn.MSECriterion() Create loss function module, the Mean Squared Error criterion

Every neural network module in torch has automatic differentiation. It has a :forward(input) function that computes the output for a given input, flowing the input through the network. and it has a :backward(input, gradient) function that will differentiate each neuron in the network with regard to the gradient that is passed in. This is done via the chain rule.

B. Load and normalize data

trainset = torch.load('___.t7') to load a training set

Now, to prepare the dataset to be used with nn.StochasticGradient :

  1. The dataset has to have a :size() function.

  2. The dataset has to have a [i] index operator, so that dataset[i] returns the ith sample in the datset.

To do that :

setmetatable(trainset,  -- the setmetatable sets the index operator. 
    {__index = function(t, i) 
                    return {t.data[i], t.label[i]} 
                end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.

function trainset:size() 
    return self.data:size(1) 
end

One of the most important things you can do in conditioning your data (in general in data-science or machine learning) is to make your data to have a mean of 0.0 and standard-deviation of 1.0. We use a data set like this : trainset[{ {number of images}, {number of channel (e.g 3 for RGB)}, {vertical pixels}, {horizontal pixels} }]. So we do :

mean = {} -- store the mean, to normalize the test set in the future
stdv  = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
    mean[i] = trainset.data[{ {}, {i}, {}, {}  }]:mean() -- mean estimation
    print('Channel ' .. i .. ', Mean: ' .. mean[i])
    trainset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction
    
    stdv[i] = trainset.data[{ {}, {i}, {}, {}  }]:std() -- std estimation
    print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
    trainset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

C. Normalize natural image

For natural images, we use several intuitive tricks:

  • images are mapped into YUV space, to separate luminance information from color information (1)

  • the luminance channel (Y) is locally normalized, using a contrastivenormalization operator: for each neighborhood, defined by a Gaussian kernel, the mean is suppressed, and the standard deviation is normalized to one. (2)

  • color channels are normalized globally, across the entire dataset; as a result, each color component has 0-mean and 1-norm across the dataset. (3)

To convert RGB to YUV (1) :

for i = 1,trainData:size() do
   trainData.data[i] = image.rgb2yuv(trainData.data[i])
end

To Normalize each feature (channel) globally (3) - see upside:

-- Name channels for convenience
channels = {'y','u','v'}
mean = {}
std = {}

To normalize all three channels locally (2)

-- Define the normalization neighborhood:
neighborhood = image.gaussian1D(13)

-- Define our local normalization operator (It is an actual nn module, 
-- which could be inserted into a trainable model):
normalization = nn.SpatialContrastiveNormalization(1, neighborhood, 1):float()

-- Normalize all channels locally :
for c in ipairs(channels) do
   for i = 1,trainData:size() do
      trainData.data[{ i,{c},{},{} }] = normalization:forward(trainData.data[{ i,{c},{},{} }])
   end
end

D. Training the network

A convolution layer learns it's convolution kernels to adapt to the input data and the problem being solved.

A max-pooling layer has no learnable parameters. It only finds the max of local windows.

A layer in torch which has learnable weights, will typically have fields .weight (and optionally, .bias)

m = nn.SpatialConvolution(1,3,2,2) -- learn 3 2x2 kernels

1 input image (for example), 3 output channels and 2x2 convolution kernel

print(m.weight) -- initially, the weights are randomly initialized
print(m.bias) -- The operation in a convolution layer is: output = convolution(input,weight) + bias

We use nn.StochasticGradient(module, criterion) for the algorithm to adjust it self.

It has a function :train(dataset) that takes a given dataset and simply trains your network by showing different samples from your dataset to the network. The dataset and the neural networkd must be created before. We should pass the nn.StochasticGradient(module, criterion).learningRate before too.

Generally, when you have to deal with image, text, audio or video data, you can use standard functions like: image.load or audio.load to load your data into a torch.Tensor or a Lua table

Neural Network Examples

Using the nn package, describing ConvNets, MLPs and other forms of sequential trainable models is really easy. All we have to do is create a top-level wrapper, which, as for the logistic regression, is going to be a sequential module, and then append modules into it.

Regular MLP

This model is parametrized by two weight matrices, and two bias vectors where the function sigmoid is typically the symmetric hyperbolic tangent function.:

y_n= W2 * sigmoid(W1*x_n + b1) + b2

model = nn.Sequential()
model:add(nn.Reshape(ninputs))
model:add(nn.Linear(ninputs,nhiddens))
model:add(nn.Tanh())
model:add(nn.Linear(nhiddens,noutputs))

Convnet

Convolutional Networks are a particular form of MLP, which was tailored to efficiently learn to classify images. The input and output of each stage are sets of arrays called feature maps.

Data must be normalized. Generally, natural images are mapped in YUV to separate luminance (normalized locally) from color (normalized globaly).

ConvNet Modules :

  • Filter bank layer:

Each filter detects a particular feature at every location on the input. input : 3D array with component denoted xijk and feature map is denoted xi. output : 3D array, y. A trainable filter (kernel) kij connects input feature map x to output feature map yj. The module computes yj = bj + ikij∗xi where ∗ is the 2D discrete convolution operator and bj is a trainable bias parameter.

  • Non-Linearity Layer :

Generally tanh() sigmoid function applied to each site (ijk).For natural image recognition is the rectified sigmoid Rabs: abs(tanh(gi)) where gi is a trainable gain parameter, followed by a normalization.

  • Feature Pooling Layer :

It treats each feature map separately, computes the average values over a neighborhood in each feature map, based on the L2-norm or Linf-norm (also known as max pooling). The neighborhoods are stepped by a stride larger than 1 (but smaller than or equal the pooling neighborhood). output : reduced-resolution feature map w. Sometimes, the pooling also pools similar feature at the same location, in addition to the same feature at nearby locations.

model = nn.Sequential()

-- 10-class problem
noutputs = 10
-- input dimensions
nfeats = 3
width = 32
height = 32
ninputs = nfeats*width*height
-- hidden units, filter sizes (for ConvNet only):
nstates = {16,256,128}
filtsize = 5
poolsize = 2
normkernel = image.gaussian1D(7)

-- stage 1 : filter bank -> squashing -> L2 pooling -> normalization
      model:add(nn.SpatialConvolutionMM(nfeats, nstates[1], filtsize, filtsize)) -- filter bank layer
      model:add(nn.Tanh()) -- non-linearity leayer
      model:add(nn.SpatialLPPooling(nstates[1],2,poolsize,poolsize,poolsize,poolsize)) --feature pooling layer
      model:add(nn.SpatialSubtractiveNormalization(nstates[1], normkernel))

-- stage 2 : filter bank -> squashing -> L2 pooling -> normalization
      model:add(nn.SpatialConvolutionMM(nstates[1], nstates[2], filtsize, filtsize))
      model:add(nn.Tanh())
      model:add(nn.SpatialLPPooling(nstates[2],2,poolsize,poolsize,poolsize,poolsize))
      model:add(nn.SpatialSubtractiveNormalization(nstates[2], normkernel))

-- stage 3 : standard 2-layer neural network (MLP)
      model:add(nn.Reshape(nstates[2]*filtsize*filtsize))
      model:add(nn.Linear(nstates[2]*filtsize*filtsize, nstates[3]))
      model:add(nn.Tanh())
      model:add(nn.Linear(nstates[3], noutputs))

Then, we must define a loss function (MMC or NLL with log-probabilities)

criterion = nn.MultiMarginCriterion() --SVM-like
criterion = nn.ClassNLLCriterion() -- negative log likehood loss

Finally, we train the model :

parameters,gradParameters = model:getParameters()
function train()

-- epoch tracker
   epoch = epoch or 1

-- do one epoch
   for t = 1,trainData:size(),opt.batchSize do
     
     -- create mini batch
      local inputs = {}
      local targets = {}
      for i = t,math.min(t+opt.batchSize-1,trainData:size()) do
         local input = trainData.data[shuffle[i]]
         local target = trainData.labels[shuffle[i]]
         input = input:double()
         table.insert(inputs, input)
         table.insert(targets, target)
      end

      -- create closure to evaluate f(X) and df/dX
      local feval = function(x)
                       -- get new parameters
                       if x ~= parameters then
                          parameters:copy(x)
                       end

                       -- reset gradients
                       gradParameters:zero()

                       -- f is the average of all criterions
                       local f = 0

                       -- evaluate function for complete mini batch
                       for i = 1,#inputs do
                          -- estimate f
                          local output = model:forward(inputs[i])
                          local err = criterion:forward(output, targets[i])
                          f = f + err

                          -- estimate df/dW
                          local df_do = criterion:backward(output, targets[i])
                          model:backward(inputs[i], df_do)
                       end

                       -- normalize gradients and f(X)
                       gradParameters:div(#inputs)
                       f = f/#inputs

                       -- return f and df/dX
                       return f,gradParameters
                    end

   end

   -- next epoch
   epoch = epoch + 1
end

Autoencoder

⚠️ **GitHub.com Fallback** ⚠️