4. Torch - alexattia/myWiki GitHub Wiki
Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation.
Table of contents :
Object Method
If z
is a torch object, we can apply its methods by using :
instead of the .
in python. For example : z:size()
.
Matrix, arrays, tensors
Indexing is beginning by 1. To access to an element of a 3D tensor, we use x[3][4][5]
.
The entire set of a 3D array is x[{ {}, {}, {}]
. In order to pick a range of elements for one dimension (from i1 to i2 include), we can do x[{ {i1, i2}, {}, {} }]
.
Numpy users
Types
Numpy | Torch |
---|---|
np.ndarray | torch.Tensor |
np.float32 | torch.FloatTensor |
np.float64 | torch.DoubleTensor |
np.uint8 | torch.ByteTensor |
Constructors
Numpy | Torch |
---|---|
np.empty([2,2]) | torch.Tensor(2,2) |
np.eye | torch.eye |
np.ones | torch.ones |
np.array([ [1,2],[3,4] ]) | torch.Tensor({{1,2},{3,4}}) |
np.ascontiguousarray(x) | x:contiguous() |
np.copy(x) | x:clone() |
Numerical Ranges
Numpy | Torch |
---|---|
np.arange(10) | torch.linspace(0,9,10) |
np.arange(2, 3, 0.1) | torch.linspace(2, 2.9, 10) |
Attributes and Methods
Numpy | Torch |
---|---|
x.shape | x:size() |
x.ndim | x:dim() |
x.data | x:data() |
x.size | x:nElement() |
x.size == y.size | x:isSameSizeAs(y) |
x.reshape | x:reshape |
x.transpose | x:transpose() |
Item selection and manipulation
Numpy | Torch |
---|---|
np.take(a, indices) | a[indices] |
x[:,0] | x[{{},1}] |
np.sort | sorted, indices = torch.sort(x, [dim]) |
Tensor
is a Torch, and probably the most important because it's handling numeric data. Tensors are serializable (that means they can be translated into a format that can be stored). A tensor is kind of a multi-dimensionality matrix. They are several types of tensors (byte, int, float, double ...).
To define a Tensor :
--- creation of a 4D-tensor 4x5x6x2
z = torch.Tensor(4,5,6,2)
--- for more dimensions, (here a 6D tensor) one can do:
s = torch.LongStorage(6)
s[1] = 4; s[2] = 5; s[3] = 6; s[4] = 2; s[5] = 7; s[6] = 3;
x = torch.Tensor(s) ---[torch.DoubleTensor of size 4x5x6x2x7x3]
One could say that a Tensor is a particular way of viewing a Storage: a Storage only represents a chunk of memory, while the Tensor interprets this chunk of memory as having dimensions. Storages are basically a way for Lua to access memory of a C pointer or array. Storages can also map the contents of a file to memory. A Storage is an array of basic C types. For arrays of Torch objects, use the Lua tables. x:storage()
returns the Storage used to store all the elements of the Tensor x
. It returns all of the data contains into x.
Constructor
torch.Tensor()
returns an empty tensor. torch.Tensor(sizes, [strides])
creates a tensor of any number of dimensions.
torch.Tensor(n)
: vector size n
Functions
z:fill(n)
fill the data by n.
z:t()
transposes z.
z:nDimension()
returns the number of dimensions.
z:size()
returns the size.
z:stride(n)
Returns the jump necessary to go from one element to the next one in the specified dimension n. Note also that in Torch elements in the same row [elements along the last dimension] are contiguous in memory for a matrix [tensor]
x:isContiguous()
boolean if the elements of the Tensor are contiguous in memory.
dst = src:type('torch.TypeTensor')
to convert data type, shortcuts are available eg : trainData.data = trainData.data:float()
to convert to Float.
Tensors can be moved onto GPU using the :cuda()
function with require cutorch
Neural networks in Torch can be constructed using the nn
package : require nn
.
Modules
are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers
to create complex neural networks.
model = nn.Sequential()
: Make a cascade module
model:add(nn.Reshape(ninputs))
: Reshape input to vector
model:add(nn.Linear(ninputs,nhiddens))
: Add Linear module with ninputs inputs and nhiddens hidden units
model:add(nn.Tanh())
: Add tanh module
net:add(nn.ReLU())
: add activate function ReLU (Rectifier : f(x) = max(0, x)
)
criterion = nn.ClassNLLCriterion()
Create loss function module, a negative log-likelihood criterion for multi-class classification
criterion = nn.MSECriterion()
Create loss function module, the Mean Squared Error criterion
Every neural network module in torch has automatic differentiation. It has a :forward(input)
function that computes the output for a given input, flowing the input through the network. and it has a :backward(input, gradient)
function that will differentiate each neuron in the network with regard to the gradient that is passed in. This is done via the chain rule.
trainset = torch.load('___.t7')
to load a training set
Now, to prepare the dataset to be used with nn.StochasticGradient :
-
The dataset has to have a :size() function.
-
The dataset has to have a [i] index operator, so that dataset[i] returns the ith sample in the datset.
To do that :
setmetatable(trainset, -- the setmetatable sets the index operator.
{__index = function(t, i)
return {t.data[i], t.label[i]}
end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.
function trainset:size()
return self.data:size(1)
end
One of the most important things you can do in conditioning your data (in general in data-science or machine learning) is to make your data to have a mean of 0.0 and standard-deviation of 1.0. We use a data set like this : trainset[{ {number of images}, {number of channel (e.g 3 for RGB)}, {vertical pixels}, {horizontal pixels} }]
. So we do :
mean = {} -- store the mean, to normalize the test set in the future
stdv = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
mean[i] = trainset.data[{ {}, {i}, {}, {} }]:mean() -- mean estimation
print('Channel ' .. i .. ', Mean: ' .. mean[i])
trainset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtraction
stdv[i] = trainset.data[{ {}, {i}, {}, {} }]:std() -- std estimation
print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
For natural images, we use several intuitive tricks:
-
images are mapped into YUV space, to separate luminance information from color information (1)
-
the luminance channel (Y) is locally normalized, using a contrastivenormalization operator: for each neighborhood, defined by a Gaussian kernel, the mean is suppressed, and the standard deviation is normalized to one. (2)
-
color channels are normalized globally, across the entire dataset; as a result, each color component has 0-mean and 1-norm across the dataset. (3)
To convert RGB to YUV (1) :
for i = 1,trainData:size() do
trainData.data[i] = image.rgb2yuv(trainData.data[i])
end
To Normalize each feature (channel) globally (3) - see upside:
-- Name channels for convenience
channels = {'y','u','v'}
mean = {}
std = {}
To normalize all three channels locally (2)
-- Define the normalization neighborhood:
neighborhood = image.gaussian1D(13)
-- Define our local normalization operator (It is an actual nn module,
-- which could be inserted into a trainable model):
normalization = nn.SpatialContrastiveNormalization(1, neighborhood, 1):float()
-- Normalize all channels locally :
for c in ipairs(channels) do
for i = 1,trainData:size() do
trainData.data[{ i,{c},{},{} }] = normalization:forward(trainData.data[{ i,{c},{},{} }])
end
end
A convolution layer learns it's convolution kernels to adapt to the input data and the problem being solved.
A max-pooling layer has no learnable parameters. It only finds the max of local windows.
A layer in torch which has learnable weights, will typically have fields .weight (and optionally, .bias)
m = nn.SpatialConvolution(1,3,2,2) -- learn 3 2x2 kernels
1 input image (for example), 3 output channels and 2x2 convolution kernel
print(m.weight) -- initially, the weights are randomly initialized
print(m.bias) -- The operation in a convolution layer is: output = convolution(input,weight) + bias
We use nn.StochasticGradient(module, criterion)
for the algorithm to adjust it self.
It has a function :train(dataset)
that takes a given dataset and simply trains your network by showing different samples from your dataset to the network. The dataset and the neural networkd must be created before. We should pass the nn.StochasticGradient(module, criterion).learningRate
before too.
Generally, when you have to deal with image, text, audio or video data, you can use standard functions like: image.load or audio.load to load your data into a torch.Tensor or a Lua table
Using the nn package, describing ConvNets, MLPs and other forms of sequential trainable models is really easy. All we have to do is create a top-level wrapper, which, as for the logistic regression, is going to be a sequential module, and then append modules into it.
This model is parametrized by two weight matrices, and two bias vectors where the function sigmoid is typically the symmetric hyperbolic tangent function.:
y_n= W2 * sigmoid(W1*x_n + b1) + b2
model = nn.Sequential()
model:add(nn.Reshape(ninputs))
model:add(nn.Linear(ninputs,nhiddens))
model:add(nn.Tanh())
model:add(nn.Linear(nhiddens,noutputs))
Convolutional Networks are a particular form of MLP, which was tailored to efficiently learn to classify images. The input and output of each stage are sets of arrays called feature maps.
Data must be normalized. Generally, natural images are mapped in YUV to separate luminance (normalized locally) from color (normalized globaly).
ConvNet Modules :
- Filter bank layer:
Each filter detects a particular feature at every location on the input. input : 3D array with component denoted xijk and feature map is denoted xi. output : 3D array, y. A trainable filter (kernel) kij connects input feature map x to output feature map yj. The module computes yj = bj + ikij∗xi where ∗ is the 2D discrete convolution operator and bj is a trainable bias parameter.
- Non-Linearity Layer :
Generally tanh() sigmoid function applied to each site (ijk).For natural image recognition is the rectified sigmoid Rabs: abs(tanh(gi)) where gi is a trainable gain parameter, followed by a normalization.
- Feature Pooling Layer :
It treats each feature map separately, computes the average values over a neighborhood in each feature map, based on the L2-norm or Linf-norm (also known as max pooling). The neighborhoods are stepped by a stride larger than 1 (but smaller than or equal the pooling neighborhood). output : reduced-resolution feature map w. Sometimes, the pooling also pools similar feature at the same location, in addition to the same feature at nearby locations.
model = nn.Sequential()
-- 10-class problem
noutputs = 10
-- input dimensions
nfeats = 3
width = 32
height = 32
ninputs = nfeats*width*height
-- hidden units, filter sizes (for ConvNet only):
nstates = {16,256,128}
filtsize = 5
poolsize = 2
normkernel = image.gaussian1D(7)
-- stage 1 : filter bank -> squashing -> L2 pooling -> normalization
model:add(nn.SpatialConvolutionMM(nfeats, nstates[1], filtsize, filtsize)) -- filter bank layer
model:add(nn.Tanh()) -- non-linearity leayer
model:add(nn.SpatialLPPooling(nstates[1],2,poolsize,poolsize,poolsize,poolsize)) --feature pooling layer
model:add(nn.SpatialSubtractiveNormalization(nstates[1], normkernel))
-- stage 2 : filter bank -> squashing -> L2 pooling -> normalization
model:add(nn.SpatialConvolutionMM(nstates[1], nstates[2], filtsize, filtsize))
model:add(nn.Tanh())
model:add(nn.SpatialLPPooling(nstates[2],2,poolsize,poolsize,poolsize,poolsize))
model:add(nn.SpatialSubtractiveNormalization(nstates[2], normkernel))
-- stage 3 : standard 2-layer neural network (MLP)
model:add(nn.Reshape(nstates[2]*filtsize*filtsize))
model:add(nn.Linear(nstates[2]*filtsize*filtsize, nstates[3]))
model:add(nn.Tanh())
model:add(nn.Linear(nstates[3], noutputs))
Then, we must define a loss function (MMC or NLL with log-probabilities)
criterion = nn.MultiMarginCriterion() --SVM-like
criterion = nn.ClassNLLCriterion() -- negative log likehood loss
Finally, we train the model :
parameters,gradParameters = model:getParameters()
function train()
-- epoch tracker
epoch = epoch or 1
-- do one epoch
for t = 1,trainData:size(),opt.batchSize do
-- create mini batch
local inputs = {}
local targets = {}
for i = t,math.min(t+opt.batchSize-1,trainData:size()) do
local input = trainData.data[shuffle[i]]
local target = trainData.labels[shuffle[i]]
input = input:double()
table.insert(inputs, input)
table.insert(targets, target)
end
-- create closure to evaluate f(X) and df/dX
local feval = function(x)
-- get new parameters
if x ~= parameters then
parameters:copy(x)
end
-- reset gradients
gradParameters:zero()
-- f is the average of all criterions
local f = 0
-- evaluate function for complete mini batch
for i = 1,#inputs do
-- estimate f
local output = model:forward(inputs[i])
local err = criterion:forward(output, targets[i])
f = f + err
-- estimate df/dW
local df_do = criterion:backward(output, targets[i])
model:backward(inputs[i], df_do)
end
-- normalize gradients and f(X)
gradParameters:div(#inputs)
f = f/#inputs
-- return f and df/dX
return f,gradParameters
end
end
-- next epoch
epoch = epoch + 1
end