colRowTabulates - HenrikBengtsson/matrixStats GitHub Wiki

matrixStats: Benchmark report


colTabulates() and rowTabulates() benchmarks

This report benchmark the performance of colTabulates() and rowTabulates() against alternative methods.

Alternative methods

  • ???

Data

> rmatrix <- function(nrow, ncol, mode = c("logical", "double", "integer", "index"), range = c(-100, 
+     +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     n <- nrow * ncol
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else if (mode == "index") {
+         x <- seq_len(n)
+         mode <- "integer"
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     dim(x) <- c(nrow, ncol)
+     x
+ }
> rmatrices <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rmatrix(nrow = scale * 1, ncol = scale * 1, ...)
+     data[[2]] <- rmatrix(nrow = scale * 10, ncol = scale * 10, ...)
+     data[[3]] <- rmatrix(nrow = scale * 100, ncol = scale * 1, ...)
+     data[[4]] <- t(data[[3]])
+     data[[5]] <- rmatrix(nrow = scale * 10, ncol = scale * 100, ...)
+     data[[6]] <- t(data[[5]])
+     names(data) <- sapply(data, FUN = function(x) paste(dim(x), collapse = "x"))
+     data
+ }
> data <- rmatrices(mode = "integer", range = c(-10, 10))

Results

10x10 matrix

> X <- data[["10x10"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3208969 171.4    5709258 305.0  5709258 305.0
Vcells 6387840  48.8   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3207288 171.3    5709258 305.0  5709258 305.0
Vcells 6382777  48.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 10x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 0.097305 0.0985175 0.100785 0.099041 0.0997155 0.230715
expr min lq mean median uq max
1 colTabulates 1 1 1 1 1 1

Table: Benchmarking of rowTabulates() on 10x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates 0.0947 0.095504 0.0980804 0.096104 0.096741 0.227335
expr min lq mean median uq max
1 rowTabulates 1 1 1 1 1 1

Figure: Benchmarking of colTabulates() on 10x10 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 10x10 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 rowTabulates 94.700 95.5040 98.08044 96.104 96.7410 227.335
1 colTabulates 97.305 98.5175 100.78495 99.041 99.7155 230.715
expr min lq mean median uq max
2 rowTabulates 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 colTabulates 1.027508 1.031554 1.027574 1.030561 1.030747 1.014868

Figure: Benchmarking of colTabulates() and rowTabulates() on 10x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

100x100 matrix

> X <- data[["100x100"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3205796 171.3    5709258 305.0  5709258 305.0
Vcells 6188333  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3205790 171.3    5709258 305.0  5709258 305.0
Vcells 6193376  47.3   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 100x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 0.398019 0.4012145 0.4066294 0.403242 0.4084285 0.535625
expr min lq mean median uq max
1 colTabulates 1 1 1 1 1 1

Table: Benchmarking of rowTabulates() on 100x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates 0.456053 0.4607885 0.4757639 0.4671985 0.472886 0.690054
expr min lq mean median uq max
1 rowTabulates 1 1 1 1 1 1

Figure: Benchmarking of colTabulates() on 100x100 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 100x100 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 398.019 401.2145 406.6294 403.2420 408.4285 535.625
2 rowTabulates 456.053 460.7885 475.7639 467.1985 472.8860 690.054
expr min lq mean median uq max
1 colTabulates 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 rowTabulates 1.145807 1.148484 1.170018 1.158606 1.157818 1.288315

Figure: Benchmarking of colTabulates() and rowTabulates() on 100x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

1000x10 matrix

> X <- data[["1000x10"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206509 171.3    5709258 305.0  5709258 305.0
Vcells 6191627  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206500 171.3    5709258 305.0  5709258 305.0
Vcells 6196665  47.3   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 1000x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 0.37314 0.3751665 0.3802066 0.377041 0.381153 0.50368
expr min lq mean median uq max
1 colTabulates 1 1 1 1 1 1

Table: Benchmarking of rowTabulates() on 1000x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates 0.461854 0.469589 0.4768716 0.4736355 0.4802825 0.639372
expr min lq mean median uq max
1 rowTabulates 1 1 1 1 1 1

Figure: Benchmarking of colTabulates() on 1000x10 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 1000x10 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 373.140 375.1665 380.2066 377.0410 381.1530 503.680
2 rowTabulates 461.854 469.5890 476.8716 473.6355 480.2825 639.372
expr min lq mean median uq max
1 colTabulates 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000
2 rowTabulates 1.23775 1.251682 1.254244 1.256191 1.260078 1.269401

Figure: Benchmarking of colTabulates() and rowTabulates() on 1000x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

10x1000 matrix

> X <- data[["10x1000"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206677 171.3    5709258 305.0  5709258 305.0
Vcells 6192196  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206671 171.3    5709258 305.0  5709258 305.0
Vcells 6197239  47.3   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 10x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 0.480075 0.486541 0.5151467 0.500231 0.5298665 0.825429
expr min lq mean median uq max
1 colTabulates 1 1 1 1 1 1

Table: Benchmarking of rowTabulates() on 10x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates 0.488855 0.499118 0.5067292 0.5035765 0.508941 0.628956
expr min lq mean median uq max
1 rowTabulates 1 1 1 1 1 1

Figure: Benchmarking of colTabulates() on 10x1000 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 10x1000 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 480.075 486.541 515.1467 500.2310 529.8665 825.429
2 rowTabulates 488.855 499.118 506.7292 503.5765 508.9410 628.956
expr min lq mean median uq max
1 colTabulates 1.000000 1.00000 1.00000 1.000000 1.000000 1.0000000
2 rowTabulates 1.018289 1.02585 0.98366 1.006688 0.960508 0.7619747

Figure: Benchmarking of colTabulates() and rowTabulates() on 10x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

100x1000 matrix

> X <- data[["100x1000"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206848 171.3    5709258 305.0  5709258 305.0
Vcells 6192610  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3206842 171.3    5709258 305.0  5709258 305.0
Vcells 6242653  47.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 100x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 2.951937 3.346852 3.654082 3.549232 3.695155 12.4183
expr min lq mean median uq max
1 colTabulates 1 1 1 1 1 1

Table: Benchmarking of rowTabulates() on 100x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates 3.421477 3.654142 3.837853 3.700607 3.787078 10.39184
expr min lq mean median uq max
1 rowTabulates 1 1 1 1 1 1

Figure: Benchmarking of colTabulates() on 100x1000 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 100x1000 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 2.951937 3.346852 3.654082 3.549232 3.695155 12.41830
2 rowTabulates 3.421477 3.654142 3.837853 3.700607 3.787078 10.39184
expr min lq mean median uq max
1 colTabulates 1.000000 1.000000 1.000000 1.00000 1.000000 1.0000000
2 rowTabulates 1.159062 1.091815 1.050292 1.04265 1.024877 0.8368168

Figure: Benchmarking of colTabulates() and rowTabulates() on 100x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

1000x100 matrix

> X <- data[["1000x100"]]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3207019 171.3    5709258 305.0  5709258 305.0
Vcells 6193080  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates = colTabulates(X, na.rm = FALSE), unit = "ms")
> X <- t(X)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3207013 171.3    5709258 305.0  5709258 305.0
Vcells 6243123  47.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates = rowTabulates(X, na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates() on 1000x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 2.744159 3.067887 3.254978 3.185613 3.323376 13.0348
expr min lq mean median uq max
1 colTabulates 1 1 1 1 1 1

Table: Benchmarking of rowTabulates() on 1000x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 rowTabulates 3.512745 3.794685 4.023299 3.829916 4.242972 10.3594
expr min lq mean median uq max
1 rowTabulates 1 1 1 1 1 1

Figure: Benchmarking of colTabulates() on 1000x100 data as well as rowTabulates() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates() and rowTabulates() on 1000x100 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 colTabulates 2.744159 3.067887 3.254978 3.185613 3.323376 13.0348
2 rowTabulates 3.512745 3.794685 4.023299 3.829916 4.242972 10.3594
expr min lq mean median uq max
1 colTabulates 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
2 rowTabulates 1.280081 1.236905 1.236045 1.202254 1.276706 0.7947496

Figure: Benchmarking of colTabulates() and rowTabulates() on 1000x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         

Total processing time was 13.14 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('colTabulates')

Copyright Henrik Bengtsson. Last updated on 2019-09-10 20:54:07 (-0700 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAMAAABEpIrGAAAA21BMVEUAAAAAAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8BAf4CAv0DA/wdHeIeHuEfH+AgIN8hId4lJdomJtknJ9g+PsE/P8BAQL9yco10dIt1dYp3d4h4eIeVlWqWlmmXl2iYmGeZmWabm2Tn5xjo6Bfp6Rb39wj4+Af//wA2M9hbAAAASXRSTlMAAQIJCgsMJSYnKD4/QGRlZmhpamtsbautrrCxuru8y8zN5ebn6Pn6+///////////////////////////////////////////LsUNcQAAAS9JREFUOI29k21XgkAQhVcFytdSMqMETU26UVqGmpaiFbL//xc1cAhhwVNf6n5i5z67M2dmYOyfJZUqlVLhkKucG7cgmUZTybDz6g0iDeq51PUr37Ds2cy2/C9NeES5puDjxuUk1xnToZsg8pfA3avHQ3lLIi7iWRrkv/OYtkScxBIMgDee0ALoyxHQBJ68JLCjOtQIMIANF7QG9G9fNnHvisCHBVMKgSJgiz7nE+AoBKrAPA3MgepvgR9TSCasrCKH0eB1wBGBFdCO+nAGjMVGPcQb5bd6mQRegN6+1axOs9nGfYcCtfi4NQosdtH7dB+txFIpXQqN1p9B/asRHToyS0jRgpV7nk4nwcq1BJ+x3Gl/v7S9Wmpp/aGquum7w3ZDyrADFYrl8vHBH+ev9AUASW1dmU4h4wAAAABJRU5ErkJggg==" document.getElementsByTagName('head')[0].appendChild(link); </script>
⚠️ **GitHub.com Fallback** ⚠️