Compressed Sparse Row (CSR) - Nori12/Machine-Learning-Tutorial GitHub Wiki

Machine Learning Tutorial

Compressed Sparse Row (CSR) or Compressed Row Storage (CRS) or Yale format

Represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values. It is similar to COO, but compresses the row indices, hence the name. This format allows fast row access and matrix-vector multiplications.

The CSR format stores a sparse m × n matrix M in row form using three (one-dimensional) arrays (V, COL_INDEX, ROW_INDEX). Let NNZ denote the number of nonzero entries in M.

  • The arrays V and COL_INDEX are of length NNZ, and contain the non-zero values and the column indices of those values respectively.
  • The array ROW_INDEX has one element per row in the matrix and encodes the index in V where the given row starts. (It may contain an extra end element which is set to NNZ).

For example, the matrix

matrix = np.array[[0, 0, 0, 0],
                  [5, 8, 0, 0],
                  [0, 0, 3, 0],
                  [0, 6, 0, 0]]

is a 4x4 with 4 nonzero element, hence

        V = [ 5 8 3 6 ]
COL_INDEX = [ 0 1 2 1 ]
ROW_INDEX = [ 0 0 2 3 4 ]
  • Advantages of the CSR format

    • efficient arithmetic operations CSR + CSR, CSR * CSR, etc.
    • efficient row slicing
    • fast matrix vector products
  • Disadvantages of the CSR format

    • slow column slicing operations (consider CSC)
    • changes to the sparsity structure are expensive (consider LIL or DOK)