colRowTabulates - HenrikBengtsson/matrixStats GitHub Wiki
matrixStats: Benchmark report
This report benchmark the performance of colTabulates() and rowTabulates() against alternative methods.
- ???
> rmatrix <- function(nrow, ncol, mode = c("logical", "double", "integer", "index"), range = c(-100, 
+     +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     n <- nrow * ncol
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else if (mode == "index") {
+         x <- seq_len(n)
+         mode <- "integer"
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     dim(x) <- c(nrow, ncol)
+     x
+ }
> rmatrices <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rmatrix(nrow = scale * 1, ncol = scale * 1, ...)
+     data[[2]] <- rmatrix(nrow = scale * 10, ncol = scale * 10, ...)
+     data[[3]] <- rmatrix(nrow = scale * 100, ncol = scale * 1, ...)
+     data[[4]] <- t(data[[3]])
+     data[[5]] <- rmatrix(nrow = scale * 10, ncol = scale * 100, ...)
+     data[[6]] <- t(data[[5]])
+     names(data) <- sapply(data, FUN = function(x) paste(dim(x), collapse = "x"))
+     data
+ }
> data <- rmatrices(mode = "integer", range = c(-10, 10))> X <- data[["10x10"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3208969 171.4    5709258 305.0  5709258 305.0
Vcells 6387840  48.8   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3207288 171.3    5709258 305.0  5709258 305.0
Vcells 6382777  48.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")Table: Benchmarking of colTabulates() on 10x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 0.097305 | 0.0985175 | 0.100785 | 0.099041 | 0.0997155 | 0.230715 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Table: Benchmarking of rowTabulates() on 10x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 0.0947 | 0.095504 | 0.0980804 | 0.096104 | 0.096741 | 0.227335 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Figure: Benchmarking of colTabulates() on 10x10 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

 Table: Benchmarking of colTabulates() and rowTabulates() on 10x10 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
Table: Benchmarking of colTabulates() and rowTabulates() on 10x10 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | rowTabulates | 94.700 | 95.5040 | 98.08044 | 96.104 | 96.7410 | 227.335 | 
| 1 | colTabulates | 97.305 | 98.5175 | 100.78495 | 99.041 | 99.7155 | 230.715 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | rowTabulates | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | colTabulates | 1.027508 | 1.031554 | 1.027574 | 1.030561 | 1.030747 | 1.014868 | 
Figure: Benchmarking of colTabulates() and rowTabulates() on 10x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

> X <- data[["100x100"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3205796 171.3    5709258 305.0  5709258 305.0
Vcells 6188333  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3205790 171.3    5709258 305.0  5709258 305.0
Vcells 6193376  47.3   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")Table: Benchmarking of colTabulates() on 100x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 0.398019 | 0.4012145 | 0.4066294 | 0.403242 | 0.4084285 | 0.535625 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Table: Benchmarking of rowTabulates() on 100x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 0.456053 | 0.4607885 | 0.4757639 | 0.4671985 | 0.472886 | 0.690054 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Figure: Benchmarking of colTabulates() on 100x100 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

 Table: Benchmarking of colTabulates() and rowTabulates() on 100x100 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
Table: Benchmarking of colTabulates() and rowTabulates() on 100x100 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 398.019 | 401.2145 | 406.6294 | 403.2420 | 408.4285 | 535.625 | 
| 2 | rowTabulates | 456.053 | 460.7885 | 475.7639 | 467.1985 | 472.8860 | 690.054 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 2 | rowTabulates | 1.145807 | 1.148484 | 1.170018 | 1.158606 | 1.157818 | 1.288315 | 
Figure: Benchmarking of colTabulates() and rowTabulates() on 100x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

> X <- data[["1000x10"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206509 171.3    5709258 305.0  5709258 305.0
Vcells 6191627  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206500 171.3    5709258 305.0  5709258 305.0
Vcells 6196665  47.3   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")Table: Benchmarking of colTabulates() on 1000x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 0.37314 | 0.3751665 | 0.3802066 | 0.377041 | 0.381153 | 0.50368 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Table: Benchmarking of rowTabulates() on 1000x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 0.461854 | 0.469589 | 0.4768716 | 0.4736355 | 0.4802825 | 0.639372 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Figure: Benchmarking of colTabulates() on 1000x10 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

 Table: Benchmarking of colTabulates() and rowTabulates() on 1000x10 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
Table: Benchmarking of colTabulates() and rowTabulates() on 1000x10 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 373.140 | 375.1665 | 380.2066 | 377.0410 | 381.1530 | 503.680 | 
| 2 | rowTabulates | 461.854 | 469.5890 | 476.8716 | 473.6355 | 480.2825 | 639.372 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1.00000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 2 | rowTabulates | 1.23775 | 1.251682 | 1.254244 | 1.256191 | 1.260078 | 1.269401 | 
Figure: Benchmarking of colTabulates() and rowTabulates() on 1000x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

> X <- data[["10x1000"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206677 171.3    5709258 305.0  5709258 305.0
Vcells 6192196  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206671 171.3    5709258 305.0  5709258 305.0
Vcells 6197239  47.3   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")Table: Benchmarking of colTabulates() on 10x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 0.480075 | 0.486541 | 0.5151467 | 0.500231 | 0.5298665 | 0.825429 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Table: Benchmarking of rowTabulates() on 10x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 0.488855 | 0.499118 | 0.5067292 | 0.5035765 | 0.508941 | 0.628956 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Figure: Benchmarking of colTabulates() on 10x1000 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

 Table: Benchmarking of colTabulates() and rowTabulates() on 10x1000 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
Table: Benchmarking of colTabulates() and rowTabulates() on 10x1000 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 480.075 | 486.541 | 515.1467 | 500.2310 | 529.8665 | 825.429 | 
| 2 | rowTabulates | 488.855 | 499.118 | 506.7292 | 503.5765 | 508.9410 | 628.956 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1.000000 | 1.00000 | 1.00000 | 1.000000 | 1.000000 | 1.0000000 | 
| 2 | rowTabulates | 1.018289 | 1.02585 | 0.98366 | 1.006688 | 0.960508 | 0.7619747 | 
Figure: Benchmarking of colTabulates() and rowTabulates() on 10x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

> X <- data[["100x1000"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206848 171.3    5709258 305.0  5709258 305.0
Vcells 6192610  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206842 171.3    5709258 305.0  5709258 305.0
Vcells 6242653  47.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")Table: Benchmarking of colTabulates() on 100x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 2.951937 | 3.346852 | 3.654082 | 3.549232 | 3.695155 | 12.4183 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Table: Benchmarking of rowTabulates() on 100x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 3.421477 | 3.654142 | 3.837853 | 3.700607 | 3.787078 | 10.39184 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Figure: Benchmarking of colTabulates() on 100x1000 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

 Table: Benchmarking of colTabulates() and rowTabulates() on 100x1000 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
Table: Benchmarking of colTabulates() and rowTabulates() on 100x1000 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 2.951937 | 3.346852 | 3.654082 | 3.549232 | 3.695155 | 12.41830 | 
| 2 | rowTabulates | 3.421477 | 3.654142 | 3.837853 | 3.700607 | 3.787078 | 10.39184 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1.000000 | 1.000000 | 1.000000 | 1.00000 | 1.000000 | 1.0000000 | 
| 2 | rowTabulates | 1.159062 | 1.091815 | 1.050292 | 1.04265 | 1.024877 | 0.8368168 | 
Figure: Benchmarking of colTabulates() and rowTabulates() on 100x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

> X <- data[["1000x100"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3207019 171.3    5709258 305.0  5709258 305.0
Vcells 6193080  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3207013 171.3    5709258 305.0  5709258 305.0
Vcells 6243123  47.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")Table: Benchmarking of colTabulates() on 1000x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 2.744159 | 3.067887 | 3.254978 | 3.185613 | 3.323376 | 13.0348 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Table: Benchmarking of rowTabulates() on 1000x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 3.512745 | 3.794685 | 4.023299 | 3.829916 | 4.242972 | 10.3594 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | rowTabulates | 1 | 1 | 1 | 1 | 1 | 1 | 
Figure: Benchmarking of colTabulates() on 1000x100 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

 Table: Benchmarking of colTabulates() and rowTabulates() on 1000x100 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
Table: Benchmarking of colTabulates() and rowTabulates() on 1000x100 data (original and transposed).  The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 2.744159 | 3.067887 | 3.254978 | 3.185613 | 3.323376 | 13.0348 | 
| 2 | rowTabulates | 3.512745 | 3.794685 | 4.023299 | 3.829916 | 4.242972 | 10.3594 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 1 | colTabulates | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.0000000 | 
| 2 | rowTabulates | 1.280081 | 1.236905 | 1.236045 | 1.202254 | 1.276706 | 0.7947496 | 
Figure: Benchmarking of colTabulates() and rowTabulates() on 1000x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     
loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         Total processing time was 13.14 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('colTabulates')Copyright Henrik Bengtsson. Last updated on 2019-09-10 20:54:07 (-0700 UTC). Powered by RSP.
<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>