Indexing in MATLAB - hasselmonians/knowledge-base GitHub Wiki

The purpose of this document is to explain indexing in programming languages using Julia as an example in order to illustrate the idiosyncracies of MATLAB (and why they are useful and sometimes not).

Primitive types

In computer programming, every piece of data has to have a type. This can be defined by the user in the code, or inferred by the compiler. When data are in a form that cannot be subdivided or indexed, it is a primitive type.

For example, in Julia, Int64 is the name of a primitive data type that represents a signed integer with 64 bits. If I were using a computer with a 64-bit register (basically any modern one), and I typed

a = 1

The variable binding a would refer to a sequence of 64 bits which are interpreted to be representative of a number (in this case, an integer in the interval [-9223372036854775808, 9223372036854775807].

Array types

If you want to store a lot of data in an ordered sequence, you want to use an array. The simplest arrays are one-dimensional sequences of primitive types. For example, in Julia, you could construct an array of integers. There are a host of ways to do this. For example, to make a one-dimensional array of floating point (decimal) numbers of value 1, you could do

julia> a = ones(3)
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

The type of this variable is parametric: it is an Array constructed with respect to the primitive type Float64 and the dimensionality 1.

Arrays are good for two major reasons: they allow you to access data of the same type sequentially and they allow you to index those same data. In Julia, you can have an Array of any type or types, including custom types. You can also have arrays of arrays. It is important to note that in Julia, Array{Float64, 1} does not equal Float64.

julia> a = ones(1)
1-element Array{Float64,1}:
 1.0

julia> b = 1
1

julia> a == b
false

julia> a[1] == b
true

Passing by value and reference

Arrays can be indexed to get at the raw data stored within the data structure. This is necessary for manipulating data structures/performing computations. Some programming languages (like MATLAB) pass-by-value which means that (a) indexing always results in more memory being used, represented by a variable binding, and (b) pointers (if any) are not exposed to the user.

A pointer is a piece of data that refers to the location in memory of another piece of data. The idea of a house address is often used in analogy. The address of a house is an unambiguous identifier to the house itself, but doesn't directly contain any information about what's in the house. A pointer points to a piece of data in memory so that you can find it.

When a programming language passes-by-reference, that means that pointers are being used. Data are not copied. Many lower-level programming languages like C, C++, Fortran, Go, and Rust make extensive use of pointers. This is because telling someone your address is way less expensive than letting them examine your entire house. Julia makes some use of passing by reference. For example, if I wanted to manipulate an array a by first creating a new variable binding b and then indexing a value within the array to change its value, the value would be changed in both variables because in Julia a and b point to the same data.

julia> a = ones(3)
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

julia> b = a
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

julia> b[1] = 0
0

julia> b
3-element Array{Float64,1}:
 0.0
 1.0
 1.0

julia> a
3-element Array{Float64,1}:
 0.0
 1.0
 1.0

In contrast, if instead of creating a new variable binding, I "sliced" into the array, using :-notation I would a new variable binding which points to different data.

julia> b = a[:]
3-element Array{Float64,1}:
 0.0
 1.0
 1.0

julia> b[1] = 1
1

julia> a
3-element Array{Float64,1}:
 0.0
 1.0
 1.0

julia> b
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

In this way, Julia allows both passing-by-reference and passing-by-value.

MATLAB

MathWorks has made several design choices which set MATLAB apart from other programming languages. These traits make the language very powerful for linear algebra, but can lead to some confusion.

Every variable is an array

Everything in MATLAB is an array, though they are usually called matrices and always have a minimum dimensionality of 2. This means that the standard data type in MATLAB for a scalar value is a 1x1 double, where Double refers to a double-precision floating point number. Scalars have matrix representation in MATLAB, thought they are afforded some special rules. For instance, scalar multiplication onto matrices is defined, even though scalars are treated as arrays. This is not true in Julia. You need an actual scalar for it to work in Julia.

Strings (sequences of characters) are a new concept in MATLAB. Before 2016, MATLAB did not have strings, though it does have "character arrays." These are matrices which contain alphanumeric symbols in the Unicode glyphset (UTF-16). If you want to load a text document in MATLAB, where each line might have a different number of characters (including spaces), you will need a Cell array, since you can't use a matrix. Alternatively, you could pad the smaller lines with spaces or some other delimiting character until all lines are the same length (and can be vertically concatenated).

More complex data types, such as Tables and Structs are also arrays in MATLAB.

All arrays pass-by-value

Slicing or otherwise indexing an array (matrix) results in a brand-new matrix.

()-indexing vs {}-indexing

MATLAB has two kinds of indexing. The simplest is parenthesis-indexing, which is used to reference values in an array while maintaining the type of the array. For example, indexing a Table will preserve the fact that it's a Table and will result in a table of smaller size.

In contrast, curly brace-notation tries to extract values stored in the array. This is especially common for Cells in MATLAB. A Cell is a type of array that can store any data type.

Operating on arrays

While in Julia, arrays are parametric and the parametrized types are defined by the types of the containing data, there are only really three types of arrays in MATLAB. The first treats the data as numbers (e.g. Single, Double, etc.), the second as Characters, and the third is anything else, usually a Cell array.

Matrix mathematics are only defined on the first type of array. Using ()-indexing on a Cell results in another Cell. Using {}-indexing will try to make a matrix of the first or second kind.

For this reason, it's common to see for loops in MATLAB of the following kind:

myOutput = zeros(length(myCellArray), 1);
for ii = 1:myCellArray
    myData = myCellArray{ii};
    % do computation on myData, save to myOutput
    % myOutput(ii) = ...
end

Collecting in arrays

If you have a non-numeric, non-character type, such as a struct with some properties, and you want to acquire a vector or matrix of the properties contained within, you can "collect" the properties using []- or {}-collection.

If you want to collect the properties in a numerical or character array, use []-collection. If instead you need the properties to be in a cell array, you should use {}-collection.

% dir gets information about the files and folders in your current directory
% it returns a struct array
mydir = dir();

[]-collection is great for collecting numbers, but not so much for string often-times

% collect filenames in a character vector
names = [mydir.name]

returns something that looks like

'...BandwidthEstimatorFIt-SNEMLE-time-courseMyDocumentsNEURONRatCatchercpplabhasselmo-tracking'

In comparison, {}-indexing yields

name =

  1×10 cell array

    {'.'}    {'..'}    {'BandwidthEstima…'}    {'FIt-SNE'}    {'MLE-time-course'}    {'MyDocuments'}    {'NEURON'}    {'RatCatcher'}    {'cpplab'}    {'hasselmo-tracking'}

This only produces 1 x n vectors. If you want to maintain some sort of multi-dimensional shape, you will need to reshape.

This can be powerful syntax. Consider the situation where you want to find all the folders but not any files. The 1 x 1 logical property isdir of the struct returned by dir can help us. In fact, we can perform logical indexing with it if we collect first.

dirinfo = dir();
dirinfo(~[dirinfo.isdir]) = [];

()-interpolation for classes and structures

Say you have a data structure which has many properties. You want to do something for each one of those properties. You need to access data stored in the structure by iterating over a list of its fields.

If we had a neuronal network model in the form of a xolotl object x with a list of compartments as a cell array, we could operate over each of those fields using dot-notation where instead of writing out each field name, we perform a loop in which we force MATLAB to treat ('some_character_vector') as a valid field of a data structure. Here is a concrete example, where we want to call the add function of each compartment with some arbitrary arguments.

x; % xolotl object
% get a cell array of the compartment names
compartments = x.find('compartment');
% iterate over these and interpolate
for ii = 1:length(compartments)
    x.(compartments{ii}).add(...);
end

Unpacking a data structure

If you have a struct or other object with fields that is in a non-scalar array, you can unpack those into variables by using left-hand-side []-notation.

% a very contrived struct
a = struct('b', {1,2,3})
% unpack into three variables (each 1x1 doubles)
[b1, b2, b3] = a.b

If instead you have a scalar object with many fields (such as a struct that contains options to pass to a function), you should use the struct2vec function instead.

Conclusion

MATLAB has very powerful tools for matrix mathematics. Use ()-notation on any data structure to index it and preserve the type of the data, creating a new variable binding and allocating memory. Use {}-notation when you want to extract from a "parametric" data type in MATLAB, such as Cells or Tables.