logSumExp_subset - HenrikBengtsson/matrixStats GitHub Wiki

matrixStats: Benchmark report


logSumExp() benchmarks on subsetted computation

This report benchmark the performance of logSumExp() on subsetted computation.

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = "double")
> data <- data[1:4]

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3235393 172.8    5709258 305.0  5709258 305.0
Vcells 7700159  58.8   24515964 187.1 57084605 435.6
> stats <- microbenchmark(logSumExp_x_S = logSumExp(x_S), `logSumExp(x, idxs)` = logSumExp(x, idxs = idxs), 
+     `logSumExp(x[idxs])` = logSumExp(x[idxs]), unit = "ms")

Table: Benchmarking of logSumExp_x_S(), logSumExp(x, idxs)() and logSumExp(x[idxs])() on n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 logSumExp_x_S 0.009721 0.0097745 0.0098346 0.009805 0.0098435 0.010916
2 logSumExp(x, idxs) 0.011089 0.0111535 0.0112625 0.011206 0.0112705 0.013398
3 logSumExp(x[idxs]) 0.011264 0.0114130 0.0117122 0.011497 0.0116260 0.030902
expr min lq mean median uq max
1 logSumExp_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
2 logSumExp(x, idxs) 1.140726 1.141081 1.145191 1.142886 1.144969 1.227373
3 logSumExp(x[idxs]) 1.158729 1.167630 1.190924 1.172565 1.181084 2.830890

Figure: Benchmarking of logSumExp_x_S(), logSumExp(x, idxs)() and logSumExp(x[idxs])() on n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3231078 172.6    5709258 305.0  5709258 305.0
Vcells 7362957  56.2   24515964 187.1 57084605 435.6
> stats <- microbenchmark(logSumExp_x_S = logSumExp(x_S), `logSumExp(x, idxs)` = logSumExp(x, idxs = idxs), 
+     `logSumExp(x[idxs])` = logSumExp(x[idxs]), unit = "ms")

Table: Benchmarking of logSumExp_x_S(), logSumExp(x, idxs)() and logSumExp(x[idxs])() on n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 logSumExp_x_S 0.090920 0.0915330 0.0957147 0.0961145 0.0973700 0.137761
3 logSumExp(x[idxs]) 0.104686 0.1057815 0.1111546 0.1101890 0.1124380 0.204400
2 logSumExp(x, idxs) 0.109608 0.1103155 0.1150915 0.1157390 0.1178045 0.130953
expr min lq mean median uq max
1 logSumExp_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
3 logSumExp(x[idxs]) 1.151408 1.155665 1.161312 1.146435 1.154750 1.4837291
2 logSumExp(x, idxs) 1.205543 1.205199 1.202444 1.204178 1.209864 0.9505811

Figure: Benchmarking of logSumExp_x_S(), logSumExp(x, idxs)() and logSumExp(x[idxs])() on n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3231150 172.6    5709258 305.0  5709258 305.0
Vcells 7458017  57.0   24515964 187.1 57084605 435.6
> stats <- microbenchmark(logSumExp_x_S = logSumExp(x_S), `logSumExp(x, idxs)` = logSumExp(x, idxs = idxs), 
+     `logSumExp(x[idxs])` = logSumExp(x[idxs]), unit = "ms")

Table: Benchmarking of logSumExp_x_S(), logSumExp(x, idxs)() and logSumExp(x[idxs])() on n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 logSumExp_x_S 0.906506 0.982352 0.9884256 0.9853045 1.011005 1.054412
3 logSumExp(x[idxs]) 1.089883 1.188050 1.2548275 1.2257790 1.323451 1.376242
2 logSumExp(x, idxs) 1.418100 1.512825 1.5407543 1.5400650 1.581674 1.641496
expr min lq mean median uq max
1 logSumExp_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
3 logSumExp(x[idxs]) 1.202290 1.209393 1.269521 1.244061 1.309046 1.305222
2 logSumExp(x, idxs) 1.564358 1.540004 1.558796 1.563035 1.564458 1.556788

Figure: Benchmarking of logSumExp_x_S(), logSumExp(x, idxs)() and logSumExp(x[idxs])() on n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3231222 172.6    5709258 305.0  5709258 305.0
Vcells 8403066  64.2   24515964 187.1 57084605 435.6
> stats <- microbenchmark(logSumExp_x_S = logSumExp(x_S), `logSumExp(x, idxs)` = logSumExp(x, idxs = idxs), 
+     `logSumExp(x[idxs])` = logSumExp(x[idxs]), unit = "ms")

Table: Benchmarking of logSumExp_x_S(), logSumExp(x, idxs)() and logSumExp(x[idxs])() on n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
1 logSumExp_x_S 9.071947 11.02365 11.74693 11.83383 12.54254 15.17648
3 logSumExp(x[idxs]) 15.574445 19.81830 21.16121 20.43505 22.24304 32.64676
2 logSumExp(x, idxs) 32.296997 34.96063 37.18128 36.57977 38.56565 62.20720
expr min lq mean median uq max
1 logSumExp_x_S 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
3 logSumExp(x[idxs]) 1.716770 1.797799 1.801425 1.726832 1.773408 2.151142
2 logSumExp(x, idxs) 3.560095 3.171421 3.165191 3.091118 3.074789 4.098923

Figure: Benchmarking of logSumExp_x_S(), logSumExp(x, idxs)() and logSumExp(x[idxs])() on n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         

Total processing time was 11.29 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('logSumExp_subset')

Copyright Dongcan Jiang. Last updated on 2019-09-10 20:58:28 (-0700 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>
⚠️ **GitHub.com Fallback** ⚠️