binCounts_subset - HenrikBengtsson/matrixStats GitHub Wiki

matrixStats: Benchmark report


binCounts() benchmarks on subsetted computation

This report benchmark the performance of binCounts() on subsetted computation.

Data type "integer"

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 int [1:100000] 722 285 591 3 349 509 216 91 150 383 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> idxs <- sample.int(length(x), size = length(x) * 0.7)

Results

> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3053531 163.1    5709258  305  5709258 305.0
Vcells 16188101 123.6   34595238  264 56666022 432.4
> stats <- microbenchmark(binCounts_x_S = binCounts(x_S, bx = bx), `binCounts(x, idxs)` = binCounts(x, 
+     idxs = idxs, bx = bx), `binCounts(x[idxs])` = binCounts(x[idxs], bx = bx), unit = "ms")

Table: Benchmarking of binCounts_x_S(), binCounts(x, idxs)() and binCounts(x[idxs])() on integer+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts_x_S 3.464805 3.516767 3.848541 3.584201 4.049233 8.374912
3 binCounts(x[idxs]) 3.631464 3.711681 3.989712 3.735329 3.951278 11.965669
2 binCounts(x, idxs) 3.620185 3.716548 3.921681 3.751879 4.284023 4.448815
expr min lq mean median uq max
1 binCounts_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
3 binCounts(x[idxs]) 1.048101 1.055424 1.036682 1.042165 0.975809 1.4287516
2 binCounts(x, idxs) 1.044845 1.056808 1.019005 1.046783 1.057984 0.5312074

Figure: Benchmarking of binCounts_x_S(), binCounts(x, idxs)() and binCounts(x[idxs])() on integer+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> idxs <- sort(idxs)
> x_S <- x[idxs]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3050781 163.0    5709258 305.0  5709258 305.0
Vcells 5051589  38.6   27676191 211.2 56666022 432.4
> stats <- microbenchmark(binCounts_x_S = binCounts(x_S, bx = bx), `binCounts(x, idxs)` = binCounts(x, 
+     idxs = idxs, bx = bx), `binCounts(x[idxs])` = binCounts(x[idxs], bx = bx), unit = "ms")

Table: Benchmarking of binCounts_x_S(), binCounts(x, idxs)() and binCounts(x[idxs])() on integer+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts_x_S 0.342860 0.3640255 0.4647694 0.3752860 0.3901185 3.723007
2 binCounts(x, idxs) 0.545045 0.5866205 0.7336247 0.6044015 0.6292265 4.455126
3 binCounts(x[idxs]) 0.544250 0.5880440 0.7073336 0.6047335 0.6287325 3.960708
expr min lq mean median uq max
1 binCounts_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 binCounts(x, idxs) 1.589701 1.611482 1.578470 1.610509 1.612911 1.196647
3 binCounts(x[idxs]) 1.587383 1.615392 1.521902 1.611394 1.611645 1.063846

Figure: Benchmarking of binCounts_x_S(), binCounts(x, idxs)() and binCounts(x[idxs])() on integer+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Non-sorted simulated data

> set.seed(48879)
> nx <- 1e+05
> xmax <- 0.01 * nx
> x <- runif(nx, min = 0, max = xmax)
> storage.mode(x) <- mode
> str(x)
 num [1:100000] 722.11 285.54 591.33 3.42 349.14 ...
> nb <- 10000
> bx <- seq(from = 0, to = xmax, length.out = nb + 1L)
> bx <- c(-1, bx, xmax + 1)
> idxs <- sample.int(length(x), size = length(x) * 0.7)

Results

> x_S <- x[idxs]
> gc()
          used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells 3050851 163.0    5709258  305  5709258 305.0
Vcells 5137148  39.2   22140953  169 56666022 432.4
> stats <- microbenchmark(binCounts_x_S = binCounts(x_S, bx = bx), `binCounts(x, idxs)` = binCounts(x, 
+     idxs = idxs, bx = bx), `binCounts(x[idxs])` = binCounts(x[idxs], bx = bx), unit = "ms")

Table: Benchmarking of binCounts_x_S(), binCounts(x, idxs)() and binCounts(x[idxs])() on double+unsorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts_x_S 4.749294 4.886123 5.151806 4.961504 5.471378 9.023233
3 binCounts(x[idxs]) 5.022232 5.180465 5.564606 5.241095 5.465776 12.838464
2 binCounts(x, idxs) 4.996679 5.178938 5.505466 5.260656 5.770683 9.607470
expr min lq mean median uq max
1 binCounts_x_S 1.000000 1.000000 1.000000 1.000000 1.0000000 1.000000
3 binCounts(x[idxs]) 1.057469 1.060240 1.080127 1.056352 0.9989763 1.422823
2 binCounts(x, idxs) 1.052089 1.059928 1.068648 1.060294 1.0547039 1.064748

Figure: Benchmarking of binCounts_x_S(), binCounts(x, idxs)() and binCounts(x[idxs])() on double+unsorted data. Outliers are displayed as crosses. Times are in milliseconds.

Sorted simulated data

> x <- sort(x)
> idxs <- sort(idxs)
> x_S <- x[idxs]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3050925 163.0    5709258 305.0  5709258 305.0
Vcells 5137197  39.2   22186262 169.3 56666022 432.4
> stats <- microbenchmark(binCounts_x_S = binCounts(x_S, bx = bx), `binCounts(x, idxs)` = binCounts(x, 
+     idxs = idxs, bx = bx), `binCounts(x[idxs])` = binCounts(x[idxs], bx = bx), unit = "ms")

Table: Benchmarking of binCounts_x_S(), binCounts(x, idxs)() and binCounts(x[idxs])() on double+sorted data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 binCounts_x_S 1.093218 1.172178 1.336594 1.198812 1.225707 4.533939
3 binCounts(x[idxs]) 1.320847 1.432406 1.536461 1.452905 1.475132 5.270048
2 binCounts(x, idxs) 1.330347 1.435493 1.538992 1.454046 1.475350 4.798756
expr min lq mean median uq max
1 binCounts_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
3 binCounts(x[idxs]) 1.208219 1.222004 1.149534 1.211954 1.203496 1.162355
2 binCounts(x, idxs) 1.216909 1.224637 1.151428 1.212906 1.203673 1.058408

Figure: Benchmarking of binCounts_x_S(), binCounts(x, idxs)() and binCounts(x[idxs])() on double+sorted data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         

Total processing time was 6.8 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('binCounts_subset')

Copyright Dongcan Jiang. Last updated on 2019-09-10 20:34:01 (-0700 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>
⚠️ **GitHub.com Fallback** ⚠️