madDiff_subset - HenrikBengtsson/matrixStats GitHub Wiki

matrixStats: Benchmark report


madDiff() benchmarks on subsetted computation

This report benchmark the performance of madDiff() on subsetted computation.

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3234356 172.8    5709258 305.0  5709258 305.0
Vcells 12934169  98.7   24515964 187.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 0.055190 0.0564065 0.0574278 0.0568090 0.0574705 0.077206
3 madDiff(x[idxs]) 0.057079 0.0582120 0.0618246 0.0588055 0.0595500 0.307727
2 madDiff(x, idxs) 0.057541 0.0586050 0.0598237 0.0589665 0.0597920 0.088428
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
3 madDiff(x[idxs]) 1.034227 1.032009 1.076562 1.035144 1.036184 3.985791
2 madDiff(x, idxs) 1.042598 1.038976 1.041720 1.037978 1.040395 1.145351

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3232097 172.7    5709258 305.0  5709258 305.0
Vcells 11805394  90.1   24515964 187.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 0.224246 0.231654 0.2367327 0.2368285 0.2414330 0.276658
3 madDiff(x[idxs]) 0.235690 0.242172 0.2499910 0.2497330 0.2546955 0.346952
2 madDiff(x, idxs) 0.236465 0.242802 0.2500311 0.2499390 0.2556420 0.275142
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
3 madDiff(x[idxs]) 1.051033 1.045404 1.056005 1.054489 1.054932 1.2540827
2 madDiff(x, idxs) 1.054489 1.048123 1.056175 1.055359 1.058853 0.9945203

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3232169 172.7    5709258 305.0  5709258 305.0
Vcells 11868954  90.6   24515964 187.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 2.095817 2.243695 2.389170 2.282126 2.348625 8.611197
2 madDiff(x, idxs) 2.230470 2.363672 2.647164 2.423176 2.512975 8.659484
3 madDiff(x[idxs]) 2.215638 2.397924 2.642880 2.446655 2.492496 8.566503
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
2 madDiff(x, idxs) 1.064248 1.053473 1.107985 1.061806 1.069977 1.0056075
3 madDiff(x[idxs]) 1.057171 1.068738 1.106192 1.072095 1.061257 0.9948098

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3232241 172.7    5709258 305.0  5709258 305.0
Vcells 12499003  95.4   24515964 187.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 18.44959 18.95174 24.86004 20.37979 25.57697 292.22811
2 madDiff(x, idxs) 21.57473 22.66015 31.62577 24.88276 29.63818 288.79574
3 madDiff(x[idxs]) 21.43227 22.80355 27.10503 27.99392 29.61907 49.22115
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
2 madDiff(x, idxs) 1.169388 1.195676 1.272153 1.220953 1.158784 0.9882545
3 madDiff(x[idxs]) 1.161666 1.203243 1.090305 1.373612 1.158037 0.1684340

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3232310 172.7    5709258 305.0  5709258 305.0
Vcells 18799046 143.5   29499156 225.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 173.4395 192.0317 233.7796 200.0665 221.1509 524.2280
2 madDiff(x, idxs) 290.4968 320.6389 387.9749 334.1310 364.1433 668.4249
3 madDiff(x[idxs]) 303.8622 325.1975 387.9358 335.7142 368.6222 651.9459
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 madDiff(x, idxs) 1.674917 1.669719 1.659576 1.670099 1.646583 1.275065
3 madDiff(x[idxs]) 1.751978 1.693458 1.659409 1.678013 1.666836 1.243631

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used (Mb)
Ncells  3232391 172.7    5709258  305  5709258  305
Vcells 17356258 132.5   61598888  470 61463574  469
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 0.070199 0.0715985 0.0727082 0.0722565 0.0730440 0.094207
3 madDiff(x[idxs]) 0.072466 0.0737975 0.0762348 0.0743830 0.0751275 0.218112
2 madDiff(x, idxs) 0.072472 0.0740670 0.0748416 0.0745635 0.0752430 0.084222
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
3 madDiff(x[idxs]) 1.032294 1.030713 1.048503 1.029430 1.028524 2.315242
2 madDiff(x, idxs) 1.032379 1.034477 1.029341 1.031928 1.030105 0.894010

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used (Mb)
Ncells  3232457 172.7    5709258  305  5709258  305
Vcells 17365745 132.5   61598888  470 61463574  469
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 0.317826 0.3262430 0.3311348 0.3301385 0.3354795 0.353511
2 madDiff(x, idxs) 0.331684 0.3391555 0.3455184 0.3444140 0.3505610 0.370233
3 madDiff(x[idxs]) 0.332179 0.3394500 0.3471538 0.3448690 0.3507025 0.464242
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 madDiff(x, idxs) 1.043602 1.039579 1.043437 1.043241 1.044955 1.047303
3 madDiff(x[idxs]) 1.045160 1.040482 1.048376 1.044619 1.045377 1.313232

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used (Mb)
Ncells  3232529 172.7    5709258  305  5709258  305
Vcells 17460624 133.3   61598888  470 61463574  469
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 2.950452 3.116595 3.321669 3.177805 3.455529 4.418886
3 madDiff(x[idxs]) 3.139160 3.289204 3.657241 3.368537 3.667559 10.142200
2 madDiff(x, idxs) 3.098777 3.283048 3.520841 3.370732 3.456164 8.919457
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
3 madDiff(x[idxs]) 1.063959 1.055384 1.101025 1.060020 1.061360 2.295194
2 madDiff(x, idxs) 1.050272 1.053409 1.059962 1.060711 1.000184 2.018485

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3232601 172.7    5709258  305  5709258 305.0
Vcells 18406067 140.5   61598888  470 61572274 469.8
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 25.79942 27.48665 30.40725 28.45360 32.35795 57.97113
3 madDiff(x[idxs]) 34.84448 37.06804 42.82830 38.03059 43.09852 321.29250
2 madDiff(x, idxs) 35.60745 37.15369 40.41188 38.27959 43.03884 58.87006
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
3 madDiff(x[idxs]) 1.350591 1.348583 1.408490 1.336583 1.331930 5.542284
2 madDiff(x, idxs) 1.380165 1.351699 1.329021 1.345334 1.330085 1.015506

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3232673 172.7    5709258  305  5709258 305.0
Vcells 27856115 212.6   61598888  470 61572274 469.8
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 madDiff_x_S 302.1222 330.8240 395.3541 347.1689 370.4564 618.6347
2 madDiff(x, idxs) 466.7321 496.3730 546.1398 506.1658 522.6195 786.5121
3 madDiff(x[idxs]) 475.4246 500.2915 597.8230 516.8004 755.2356 793.2776
expr min lq mean median uq max
1 madDiff_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 madDiff(x, idxs) 1.544846 1.500414 1.381394 1.457981 1.410745 1.271367
3 madDiff(x[idxs]) 1.573617 1.512259 1.512120 1.488614 2.038663 1.282304

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         

Total processing time was 4.82 mins.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('madDiff_subset')

Copyright Dongcan Jiang. Last updated on 2019-09-10 21:03:27 (-0700 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>
⚠️ **GitHub.com Fallback** ⚠️