matrixStats: Benchmark report

sum2() benchmarks on subsetted computation

This report benchmark the performance of sum2() on subsetted computation.

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3240858 173.1    5709258 305.0  5709258 305.0
Vcells 12921254  98.6   28839795 220.1 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	0.002190	0.0022480	0.0023238	0.002268	0.0023555	0.002870
2	sum2(x, idxs)	0.002872	0.0029195	0.0030163	0.002966	0.0031015	0.003742
3	sum2(x[idxs])	0.003755	0.0038740	0.0053282	0.003972	0.0040930	0.136824

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.00000	1.000000	1.000000	1.000000	1.000000
2	sum2(x, idxs)	1.311416	1.29871	1.298030	1.307760	1.316706	1.303833
3	sum2(x[idxs])	1.714612	1.72331	2.292913	1.751323	1.737635	47.673868

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3238800  173    5709258 305.0  5709258 305.0
Vcells 11791872   90   28839795 220.1 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	0.008838	0.008996	0.0091332	0.0090765	0.009198	0.011473
2	sum2(x, idxs)	0.014471	0.014606	0.0148497	0.0146760	0.014792	0.021946
3	sum2(x[idxs])	0.020736	0.021109	0.0219148	0.0212720	0.021467	0.064195

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	sum2(x, idxs)	1.637361	1.623611	1.625896	1.616923	1.608176	1.912839
3	sum2(x[idxs])	2.346232	2.346487	2.399457	2.343635	2.333877	5.595311

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3238872 173.0    5709258 305.0  5709258 305.0
Vcells 11855432  90.5   28839795 220.1 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	0.074186	0.074325	0.0746227	0.0744155	0.0746215	0.077288
2	sum2(x, idxs)	0.134825	0.135009	0.1354628	0.1351140	0.1353220	0.143342
3	sum2(x[idxs])	0.221791	0.222845	0.2255470	0.2233340	0.2245325	0.349794

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	sum2(x, idxs)	1.817391	1.816468	1.815302	1.815670	1.813445	1.854648
3	sum2(x[idxs])	2.989661	2.998251	3.022497	3.001176	3.008952	4.525851

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3238944 173.0    5709258 305.0  5709258 305.0
Vcells 12485481  95.3   28839795 220.1 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	0.732001	0.814316	0.8467126	0.8446555	0.8714105	1.007080
2	sum2(x, idxs)	1.769217	2.101954	2.2156949	2.2194915	2.3216820	3.110682
3	sum2(x[idxs])	2.809793	4.210238	4.6073393	4.3763790	4.6239305	16.027118

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.00000	1.000000	1.000000	1.000000	1.00000	1.000000
2	sum2(x, idxs)	2.41696	2.581250	2.616821	2.627688	2.66428	3.088813
3	sum2(x[idxs])	3.83851	5.170275	5.441444	5.181259	5.30626	15.914444

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239016 173.0    5709258 305.0  5709258 305.0
Vcells 18785529 143.4   34687754 264.7 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	9.60869	12.10403	13.99947	12.71874	16.89679	17.78463
2	sum2(x, idxs)	86.94654	96.47109	99.66171	98.27021	103.18522	109.55295
3	sum2(x[idxs])	129.36483	136.79755	147.68536	139.58785	148.21950	409.31043

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	sum2(x, idxs)	9.048741	7.970161	7.118964	7.726411	6.106793	6.159981
3	sum2(x[idxs])	13.463316	11.301816	10.549356	10.974975	8.772049	23.014846

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on integer+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239088 173.0    5709258 305.0  5709258 305.0
Vcells 17342682 132.4   41705304 318.2 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	0.002153	0.0022240	0.0023368	0.002276	0.0024410	0.002854
2	sum2(x, idxs)	0.002838	0.0028990	0.0029866	0.002927	0.0030115	0.004507
3	sum2(x[idxs])	0.003739	0.0039885	0.0043585	0.004075	0.0042040	0.026651

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	sum2(x, idxs)	1.318161	1.303507	1.278056	1.286028	1.233716	1.579187
3	sum2(x[idxs])	1.736646	1.793390	1.865153	1.790422	1.722245	9.338122

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239160 173.0    5709258 305.0  5709258 305.0
Vcells 17352179 132.4   41705304 318.2 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	0.008890	0.009032	0.0091546	0.0091230	0.009226	0.011079
2	sum2(x, idxs)	0.014576	0.014711	0.0148620	0.0147845	0.014884	0.017632
3	sum2(x[idxs])	0.022720	0.023244	0.0240554	0.0235210	0.023820	0.058946

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	sum2(x, idxs)	1.639595	1.628764	1.623442	1.620574	1.613267	1.591479
3	sum2(x[idxs])	2.555680	2.573516	2.627679	2.578209	2.581834	5.320516

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239232 173.0    5709258 305.0  5709258 305.0
Vcells 17447046 133.2   41705304 318.2 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	0.074257	0.0744585	0.0752039	0.0747345	0.0753900	0.082771
2	sum2(x, idxs)	0.146503	0.1467050	0.1472006	0.1468430	0.1469555	0.163726
3	sum2(x[idxs])	0.257554	0.2612740	0.3219299	0.2669460	0.3883100	0.453088

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	sum2(x, idxs)	1.972918	1.970292	1.957354	1.964862	1.949270	1.978060
3	sum2(x[idxs])	3.468414	3.508988	4.280762	3.571925	5.150683	5.473994

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239304 173.0    5709258 305.0  5709258 305.0
Vcells 18392476 140.4   41705304 318.2 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	0.796457	1.240280	1.258551	1.257955	1.290671	1.392572
2	sum2(x, idxs)	4.741483	5.288318	5.366747	5.380356	5.445163	5.830528
3	sum2(x[idxs])	5.928098	9.805182	10.399687	9.983529	10.196608	26.636727

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	sum2(x, idxs)	5.953219	4.263808	4.264226	4.277067	4.218864	4.186877
3	sum2(x[idxs])	7.443086	7.905616	8.263221	7.936320	7.900241	19.127720

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3239376 173.1    5709258 305.0  5709258 305.0
Vcells 27842524 212.5   50126364 382.5 87357391 666.5
> stats <- microbenchmark(sum2_x_S = sum2(x_S), `sum2(x, idxs)` = sum2(x, idxs = idxs), `sum2(x[idxs])` = sum2(x[idxs]), 
+     unit = "ms")

Table: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	9.611336	12.09921	15.44088	13.55772	20.02822	22.08697
2	sum2(x, idxs)	92.115922	140.56294	149.20767	147.07898	159.33201	191.04255
3	sum2(x[idxs])	133.500922	169.75578	184.78027	180.82242	186.80018	460.96444

	expr	min	lq	mean	median	uq	max
1	sum2_x_S	1.000000	1.00000	1.000000	1.00000	1.000000	1.000000
2	sum2(x, idxs)	9.584091	11.61753	9.663161	10.84835	7.955376	8.649558
3	sum2(x[idxs])	13.889944	14.03032	11.966955	13.33722	9.326849	20.870423

Figure: Benchmarking of sum2_x_S(), sum2(x, idxs)() and sum2(x[idxs])() on double+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3

Total processing time was 1.31 mins.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('sum2_subset')

sum2_subset - HenrikBengtsson/matrixStats GitHub Wiki

sum2() benchmarks on subsetted computation

Data type "integer"

Data

Results

n = 1000 vector

n = 10000 vector

n = 100000 vector

n = 1000000 vector

n = 10000000 vector

Data type "double"

Data

Results

n = 1000 vector

n = 10000 vector

n = 100000 vector

n = 1000000 vector

n = 10000000 vector

Appendix

Session information

Reproducibility

⚠️ GitHub.com Fallback ⚠️

sum2_subset - HenrikBengtsson/matrixStats GitHub Wiki

sum2() benchmarks on subsetted computation

Data type "integer"

Data

Results

n = 1000 vector

n = 10000 vector

n = 100000 vector

n = 1000000 vector

n = 10000000 vector

Data type "double"

Data

Results

n = 1000 vector

n = 10000 vector

n = 100000 vector

n = 1000000 vector

n = 10000000 vector

Appendix

Session information

Reproducibility

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️