Compressed Sparse Row (CSR) - Nori12/Machine-Learning-Tutorial GitHub Wiki
Machine Learning Tutorial
Compressed Sparse Row (CSR) or Compressed Row Storage (CRS) or Yale format
Represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values. It is similar to COO, but compresses the row indices, hence the name. This format allows fast row access and matrix-vector multiplications.
The CSR format stores a sparse m × n matrix M in row form using three (one-dimensional) arrays (V, COL_INDEX, ROW_INDEX). Let NNZ denote the number of nonzero entries in M.
- The arrays V and COL_INDEX are of length NNZ, and contain the non-zero values and the column indices of those values respectively.
- The array ROW_INDEX has one element per row in the matrix and encodes the index in V where the given row starts. (It may contain an extra end element which is set to NNZ).
For example, the matrix
matrix = np.array[[0, 0, 0, 0],
[5, 8, 0, 0],
[0, 0, 3, 0],
[0, 6, 0, 0]]
is a 4x4 with 4 nonzero element, hence
V = [ 5 8 3 6 ]
COL_INDEX = [ 0 1 2 1 ]
ROW_INDEX = [ 0 0 2 3 4 ]
-
Advantages of the CSR format
- efficient arithmetic operations CSR + CSR, CSR * CSR, etc.
- efficient row slicing
- fast matrix vector products
-
Disadvantages of the CSR format
- slow column slicing operations (consider CSC)
- changes to the sparsity structure are expensive (consider LIL or DOK)