colRowTabulates_subset - HenrikBengtsson/matrixStats GitHub Wiki

matrixStats: Benchmark report


colTabulates() and rowTabulates() benchmarks on subsetted computation

This report benchmark the performance of colTabulates() and rowTabulates() on subsetted computation.

Data

> rmatrix <- function(nrow, ncol, mode = c("logical", "double", "integer", "index"), range = c(-100, 
+     +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     n <- nrow * ncol
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else if (mode == "index") {
+         x <- seq_len(n)
+         mode <- "integer"
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     dim(x) <- c(nrow, ncol)
+     x
+ }
> rmatrices <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rmatrix(nrow = scale * 1, ncol = scale * 1, ...)
+     data[[2]] <- rmatrix(nrow = scale * 10, ncol = scale * 10, ...)
+     data[[3]] <- rmatrix(nrow = scale * 100, ncol = scale * 1, ...)
+     data[[4]] <- t(data[[3]])
+     data[[5]] <- rmatrix(nrow = scale * 10, ncol = scale * 100, ...)
+     data[[6]] <- t(data[[5]])
+     names(data) <- sapply(data, FUN = function(x) paste(dim(x), collapse = "x"))
+     data
+ }
> data <- rmatrices(mode = "integer", range = c(-10, 10))

Results

10x10 matrix

> X <- data[["10x10"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3205801 171.3    5709258 305.0  5709258 305.0
Vcells 6502589  49.7   22343563 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204525 171.2    5709258 305.0  5709258 305.0
Vcells 6498843  49.6   22343563 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 10x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 0.087540 0.0889065 0.0980791 0.0896510 0.0908495 0.429847
2 colTabulates(X, rows, cols) 0.089037 0.0903495 0.0971355 0.0909390 0.0921390 0.219770
3 colTabulates(X[rows, cols]) 0.089401 0.0902440 0.1008590 0.0912245 0.0924605 0.288946
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.0000000 1.000000 1.000000 1.0000000
2 colTabulates(X, rows, cols) 1.017101 1.016231 0.9903792 1.014367 1.014194 0.5112749
3 colTabulates(X[rows, cols]) 1.021259 1.015044 1.0283440 1.017551 1.017733 0.6722066

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 10x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates_X_S 0.085704 0.0866480 0.0883837 0.0874370 0.0881800 0.102645
3 rowTabulates(X[cols, rows]) 0.086742 0.0880295 0.0892940 0.0885225 0.0892075 0.130317
2 rowTabulates(X, cols, rows) 0.086714 0.0881100 0.0931348 0.0885260 0.0894770 0.420335
expr min lq mean median uq max
1 rowTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
3 rowTabulates(X[cols, rows]) 1.012112 1.015944 1.010300 1.012415 1.011652 1.269589
2 rowTabulates(X, cols, rows) 1.011785 1.016873 1.053755 1.012455 1.014709 4.095036

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 10x10 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 10x10 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 rowTabulates_X_S 85.704 86.6480 88.38371 87.437 88.1800 102.645
1 colTabulates_X_S 87.540 88.9065 98.07909 89.651 90.8495 429.847
expr min lq mean median uq max
2 rowTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 colTabulates_X_S 1.021423 1.026065 1.109697 1.025321 1.030273 4.187705

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 10x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

100x100 matrix

> X <- data[["100x100"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3203486 171.1    5709258 305.0  5709258 305.0
Vcells 6167977  47.1   22343563 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3203477 171.1    5709258 305.0  5709258 305.0
Vcells 6173055  47.1   22343563 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 100x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 0.253419 0.2609700 0.2726863 0.2656830 0.280109 0.362675
2 colTabulates(X, rows, cols) 0.262151 0.2719990 0.2841614 0.2791645 0.293225 0.412909
3 colTabulates(X[rows, cols]) 0.261782 0.2737665 0.2917133 0.2794695 0.294835 0.622681
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 colTabulates(X, rows, cols) 1.034457 1.042262 1.042082 1.050743 1.046825 1.138510
3 colTabulates(X[rows, cols]) 1.033001 1.049034 1.069776 1.051891 1.052572 1.716912

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 100x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates_X_S 0.276059 0.2857250 0.2933003 0.2923040 0.2996070 0.335534
2 rowTabulates(X, cols, rows) 0.285986 0.2949865 0.3022689 0.3019410 0.3090690 0.323716
3 rowTabulates(X[cols, rows]) 0.283183 0.2971275 0.3120059 0.3023895 0.3084625 1.172120
expr min lq mean median uq max
1 rowTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
2 rowTabulates(X, cols, rows) 1.035960 1.032414 1.030578 1.032969 1.031581 0.9647785
3 rowTabulates(X[cols, rows]) 1.025806 1.039907 1.063776 1.034504 1.029557 3.4932973

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 100x100 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 100x100 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 253.419 260.970 272.6863 265.683 280.109 362.675
2 rowTabulates_X_S 276.059 285.725 293.3003 292.304 299.607 335.534
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
2 rowTabulates_X_S 1.089338 1.094858 1.075596 1.100198 1.069609 0.9251644

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 100x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

1000x10 matrix

> X <- data[["1000x10"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204238 171.2    5709258 305.0  5709258 305.0
Vcells 6172045  47.1   22343563 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204229 171.2    5709258 305.0  5709258 305.0
Vcells 6177123  47.2   22343563 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 1000x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 0.233580 0.2362520 0.2447179 0.2401835 0.2475065 0.329554
3 colTabulates(X[rows, cols]) 0.241319 0.2446845 0.2515279 0.2484270 0.2547075 0.311018
2 colTabulates(X, rows, cols) 0.241028 0.2443960 0.2583131 0.2500805 0.2600480 0.421301
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
3 colTabulates(X[rows, cols]) 1.033132 1.035693 1.027828 1.034322 1.029094 0.9437543
2 colTabulates(X, rows, cols) 1.031886 1.034472 1.055554 1.041206 1.050671 1.2783975

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 1000x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates_X_S 0.276673 0.2801780 0.2889631 0.2851215 0.2919005 0.452778
3 rowTabulates(X[cols, rows]) 0.287231 0.2913165 0.3008558 0.2962890 0.3055805 0.403375
2 rowTabulates(X, cols, rows) 0.286890 0.2910215 0.2980739 0.2965630 0.3026750 0.340465
expr min lq mean median uq max
1 rowTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
3 rowTabulates(X[cols, rows]) 1.038161 1.039755 1.041157 1.039168 1.046865 0.8908891
2 rowTabulates(X, cols, rows) 1.036928 1.038702 1.031529 1.040129 1.036912 0.7519469

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 1000x10 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 1000x10 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 233.580 236.252 244.7179 240.1835 247.5065 329.554
2 rowTabulates_X_S 276.673 280.178 288.9631 285.1215 291.9005 452.778
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 rowTabulates_X_S 1.184489 1.185929 1.180801 1.187099 1.179365 1.373911

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 1000x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

10x1000 matrix

> X <- data[["10x1000"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204444 171.2    5709258 305.0  5709258 305.0
Vcells 6172952  47.1   22343563 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204435 171.2    5709258 305.0  5709258 305.0
Vcells 6178030  47.2   22345455 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 10x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 0.297480 0.3047505 2.8225353 0.3099065 0.3148250 250.891787
3 colTabulates(X[rows, cols]) 0.304955 0.3150990 0.3232788 0.3190925 0.3238450 0.573301
2 colTabulates(X, rows, cols) 0.305951 0.3151110 0.3291838 0.3212190 0.3255665 0.693542
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.0000000 1.000000 1.000000 1.0000000
3 colTabulates(X[rows, cols]) 1.025128 1.033957 0.1145349 1.029641 1.028651 0.0022851
2 colTabulates(X, rows, cols) 1.028476 1.033997 0.1166270 1.036503 1.034119 0.0027643

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 10x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates_X_S 0.291203 0.3048035 0.3166833 0.3105615 0.3189695 0.485174
3 rowTabulates(X[cols, rows]) 0.302955 0.3157510 0.3264460 0.3193135 0.3318315 0.524493
2 rowTabulates(X, cols, rows) 0.301042 0.3162710 0.3273503 0.3213555 0.3335205 0.444905
expr min lq mean median uq max
1 rowTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
3 rowTabulates(X[cols, rows]) 1.040357 1.035917 1.030828 1.028181 1.040324 1.0810410
2 rowTabulates(X, cols, rows) 1.033787 1.037623 1.033683 1.034756 1.045619 0.9170009

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 10x1000 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 10x1000 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 297.480 304.7505 2822.5354 309.9065 314.8250 250891.787
2 rowTabulates_X_S 291.203 304.8035 316.6833 310.5615 318.9695 485.174
expr min lq mean median uq max
1 colTabulates_X_S 1.0000000 1.000000 1.0000000 1.000000 1.000000 1.0000000
2 rowTabulates_X_S 0.9788994 1.000174 0.1121982 1.002114 1.013165 0.0019338

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 10x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

100x1000 matrix

> X <- data[["100x1000"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204643 171.2    5709258 305.0  5709258 305.0
Vcells 6195618  47.3   22345455 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204634 171.2    5709258 305.0  5709258 305.0
Vcells 6245696  47.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 100x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 1.567425 1.639608 1.705160 1.655323 1.771980 2.380615
2 colTabulates(X, rows, cols) 1.645321 1.694262 1.870166 1.728113 1.860080 8.493609
3 colTabulates(X[rows, cols]) 1.642920 1.715765 4.394435 1.736148 1.947884 252.808727
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 colTabulates(X, rows, cols) 1.049697 1.033334 1.096768 1.043974 1.049718 3.567821
3 colTabulates(X[rows, cols]) 1.048165 1.046449 2.577139 1.048828 1.099269 106.194713

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 100x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates_X_S 1.748535 1.842582 1.994827 1.887706 1.920214 8.649732
2 rowTabulates(X, cols, rows) 1.823317 1.918680 2.068009 1.945528 1.992203 8.626591
3 rowTabulates(X[cols, rows]) 1.831858 1.923616 1.991026 1.959248 2.003563 2.853159
expr min lq mean median uq max
1 rowTabulates_X_S 1.000000 1.000000 1.0000000 1.000000 1.000000 1.0000000
2 rowTabulates(X, cols, rows) 1.042768 1.041299 1.0366861 1.030631 1.037490 0.9973247
3 rowTabulates(X[cols, rows]) 1.047653 1.043978 0.9980948 1.037899 1.043407 0.3298552

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 100x1000 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 100x1000 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 1.567425 1.639608 1.705160 1.655323 1.771980 2.380615
2 rowTabulates_X_S 1.748535 1.842582 1.994827 1.887706 1.920214 8.649732
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 rowTabulates_X_S 1.115546 1.123795 1.169876 1.140385 1.083654 3.633402

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 100x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

1000x100 matrix

> X <- data[["1000x100"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204859 171.2    5709258 305.0  5709258 305.0
Vcells 6196447  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204850 171.2    5709258 305.0  5709258 305.0
Vcells 6246525  47.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 1000x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 1.418797 1.470277 1.498454 1.489486 1.504072 1.788356
2 colTabulates(X, rows, cols) 1.485973 1.556599 1.660665 1.563964 1.601022 8.140991
3 colTabulates(X[rows, cols]) 1.484032 1.556742 1.657618 1.565163 1.593916 8.321581
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 colTabulates(X, rows, cols) 1.047347 1.058712 1.108252 1.050003 1.064458 4.552221
3 colTabulates(X[rows, cols]) 1.045979 1.058808 1.106219 1.050808 1.059734 4.653202

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 1000x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates_X_S 1.819188 1.922722 2.013304 1.977278 2.113175 3.094882
2 rowTabulates(X, cols, rows) 1.904025 2.013527 2.153842 2.059273 2.173602 8.314753
3 rowTabulates(X[cols, rows]) 1.907501 2.010804 2.187695 2.080598 2.263897 8.807484
expr min lq mean median uq max
1 rowTabulates_X_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 rowTabulates(X, cols, rows) 1.046634 1.047228 1.069805 1.041468 1.028595 2.686614
3 rowTabulates(X[cols, rows]) 1.048545 1.045811 1.086620 1.052254 1.071325 2.845822

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 1000x100 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 1000x100 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates_X_S 1.418797 1.470277 1.498454 1.489486 1.504072 1.788356
2 rowTabulates_X_S 1.819188 1.922722 2.013304 1.977278 2.113175 3.094882
expr min lq mean median uq max
1 colTabulates_X_S 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000
2 rowTabulates_X_S 1.282205 1.307728 1.343587 1.32749 1.404969 1.730574

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 1000x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         

Total processing time was 15.56 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('colRowTabulates_subset')

Copyright Dongcan Jiang. Last updated on 2019-09-10 20:53:52 (-0700 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>
⚠️ **GitHub.com Fallback** ⚠️