sum2_subset - HenrikBengtsson/matrixStats GitHub Wiki

matrixStats: Benchmark report


sum2() benchmarks on subsetted computation

This report benchmark the performance of sum2() on subsetted computation.

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3240858 173.1    5709258 305.0  5709258 305.0
Vcells 12921254  98.6   28839795 220.1 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 0.002190 0.0022480 0.0023238 0.002268 0.0023555 0.002870
2 sum2(x, idxs) 0.002872 0.0029195 0.0030163 0.002966 0.0031015 0.003742
3 sum2(x[idxs]) 0.003755 0.0038740 0.0053282 0.003972 0.0040930 0.136824
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000
2 sum2(x, idxs) 1.311416 1.29871 1.298030 1.307760 1.316706 1.303833
3 sum2(x[idxs]) 1.714612 1.72331 2.292913 1.751323 1.737635 47.673868

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3238800  173    5709258 305.0  5709258 305.0
Vcells 11791872   90   28839795 220.1 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 0.008838 0.008996 0.0091332 0.0090765 0.009198 0.011473
2 sum2(x, idxs) 0.014471 0.014606 0.0148497 0.0146760 0.014792 0.021946
3 sum2(x[idxs]) 0.020736 0.021109 0.0219148 0.0212720 0.021467 0.064195
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 sum2(x, idxs) 1.637361 1.623611 1.625896 1.616923 1.608176 1.912839
3 sum2(x[idxs]) 2.346232 2.346487 2.399457 2.343635 2.333877 5.595311

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3238872 173.0    5709258 305.0  5709258 305.0
Vcells 11855432  90.5   28839795 220.1 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 0.074186 0.074325 0.0746227 0.0744155 0.0746215 0.077288
2 sum2(x, idxs) 0.134825 0.135009 0.1354628 0.1351140 0.1353220 0.143342
3 sum2(x[idxs]) 0.221791 0.222845 0.2255470 0.2233340 0.2245325 0.349794
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 sum2(x, idxs) 1.817391 1.816468 1.815302 1.815670 1.813445 1.854648
3 sum2(x[idxs]) 2.989661 2.998251 3.022497 3.001176 3.008952 4.525851

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3238944 173.0    5709258 305.0  5709258 305.0
Vcells 12485481  95.3   28839795 220.1 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 0.732001 0.814316 0.8467126 0.8446555 0.8714105 1.007080
2 sum2(x, idxs) 1.769217 2.101954 2.2156949 2.2194915 2.3216820 3.110682
3 sum2(x[idxs]) 2.809793 4.210238 4.6073393 4.3763790 4.6239305 16.027118
expr min lq mean median uq max
1 sum2_x_S 1.00000 1.000000 1.000000 1.000000 1.00000 1.000000
2 sum2(x, idxs) 2.41696 2.581250 2.616821 2.627688 2.66428 3.088813
3 sum2(x[idxs]) 3.83851 5.170275 5.441444 5.181259 5.30626 15.914444

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239016 173.0    5709258 305.0  5709258 305.0
Vcells 18785529 143.4   34687754 264.7 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 9.60869 12.10403 13.99947 12.71874 16.89679 17.78463
2 sum2(x, idxs) 86.94654 96.47109 99.66171 98.27021 103.18522 109.55295
3 sum2(x[idxs]) 129.36483 136.79755 147.68536 139.58785 148.21950 409.31043
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 sum2(x, idxs) 9.048741 7.970161 7.118964 7.726411 6.106793 6.159981
3 sum2(x[idxs]) 13.463316 11.301816 10.549356 10.974975 8.772049 23.014846

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239088 173.0    5709258 305.0  5709258 305.0
Vcells 17342682 132.4   41705304 318.2 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 0.002153 0.0022240 0.0023368 0.002276 0.0024410 0.002854
2 sum2(x, idxs) 0.002838 0.0028990 0.0029866 0.002927 0.0030115 0.004507
3 sum2(x[idxs]) 0.003739 0.0039885 0.0043585 0.004075 0.0042040 0.026651
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 sum2(x, idxs) 1.318161 1.303507 1.278056 1.286028 1.233716 1.579187
3 sum2(x[idxs]) 1.736646 1.793390 1.865153 1.790422 1.722245 9.338122

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239160 173.0    5709258 305.0  5709258 305.0
Vcells 17352179 132.4   41705304 318.2 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 0.008890 0.009032 0.0091546 0.0091230 0.009226 0.011079
2 sum2(x, idxs) 0.014576 0.014711 0.0148620 0.0147845 0.014884 0.017632
3 sum2(x[idxs]) 0.022720 0.023244 0.0240554 0.0235210 0.023820 0.058946
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 sum2(x, idxs) 1.639595 1.628764 1.623442 1.620574 1.613267 1.591479
3 sum2(x[idxs]) 2.555680 2.573516 2.627679 2.578209 2.581834 5.320516

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239232 173.0    5709258 305.0  5709258 305.0
Vcells 17447046 133.2   41705304 318.2 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 0.074257 0.0744585 0.0752039 0.0747345 0.0753900 0.082771
2 sum2(x, idxs) 0.146503 0.1467050 0.1472006 0.1468430 0.1469555 0.163726
3 sum2(x[idxs]) 0.257554 0.2612740 0.3219299 0.2669460 0.3883100 0.453088
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 sum2(x, idxs) 1.972918 1.970292 1.957354 1.964862 1.949270 1.978060
3 sum2(x[idxs]) 3.468414 3.508988 4.280762 3.571925 5.150683 5.473994

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239304 173.0    5709258 305.0  5709258 305.0
Vcells 18392476 140.4   41705304 318.2 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 0.796457 1.240280 1.258551 1.257955 1.290671 1.392572
2 sum2(x, idxs) 4.741483 5.288318 5.366747 5.380356 5.445163 5.830528
3 sum2(x[idxs]) 5.928098 9.805182 10.399687 9.983529 10.196608 26.636727
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 sum2(x, idxs) 5.953219 4.263808 4.264226 4.277067 4.218864 4.186877
3 sum2(x[idxs]) 7.443086 7.905616 8.263221 7.936320 7.900241 19.127720

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239376 173.1    5709258 305.0  5709258 305.0
Vcells 27842524 212.5   50126364 382.5 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 sum2_x_S 9.611336 12.09921 15.44088 13.55772 20.02822 22.08697
2 sum2(x, idxs) 92.115922 140.56294 149.20767 147.07898 159.33201 191.04255
3 sum2(x[idxs]) 133.500922 169.75578 184.78027 180.82242 186.80018 460.96444
expr min lq mean median uq max
1 sum2_x_S 1.000000 1.00000 1.000000 1.00000 1.000000 1.000000
2 sum2(x, idxs) 9.584091 11.61753 9.663161 10.84835 7.955376 8.649558
3 sum2(x[idxs]) 13.889944 14.03032 11.966955 13.33722 9.326849 20.870423

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         

Total processing time was 1.31 mins.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('sum2_subset')

Copyright Dongcan Jiang. Last updated on 2019-09-10 21:10:42 (-0700 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>
⚠️ **GitHub.com Fallback** ⚠️