matrixStats: Benchmark report

varDiff() benchmarks

This report benchmark the performance of varDiff() against alternative methods.

Alternative methods

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)
> data <- data[1:4]

Results

n = 1000 vector

All elements

> x <- data[["n = 1000"]]
> stats <- microbenchmark(varDiff = varDiff(x), var = var(x), diff = diff(x), unit = "ms")

Table: Benchmarking of varDiff(), var() and diff() on integer+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	var	0.009196	0.0096955	0.0103208	0.0098975	0.0103485	0.041973
3	diff	0.011785	0.0125135	0.0137085	0.0131970	0.0139860	0.040098
1	varDiff	0.012780	0.0135215	0.0141981	0.0139610	0.0144280	0.032514

	expr	min	lq	mean	median	uq	max
2	var	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
3	diff	1.281535	1.290650	1.328243	1.333367	1.351500	0.9553284
1	varDiff	1.389735	1.394616	1.375682	1.410558	1.394212	0.7746408

Figure: Benchmarking of varDiff(), var() and diff() on integer+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

All elements

> x <- data[["n = 10000"]]
> stats <- microbenchmark(varDiff = varDiff(x), var = var(x), diff = diff(x), unit = "ms")

Table: Benchmarking of varDiff(), var() and diff() on integer+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	var	0.045200	0.0465395	0.0477511	0.0469645	0.0477805	0.079852
1	varDiff	0.057211	0.0583670	0.0600053	0.0591335	0.0604620	0.086723
3	diff	0.099175	0.1013915	0.1043614	0.1026815	0.1048740	0.140382

	expr	min	lq	mean	median	uq	max
2	var	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	varDiff	1.265730	1.254139	1.256627	1.259111	1.265412	1.086047
3	diff	2.194137	2.178612	2.185531	2.186364	2.194912	1.758027

Figure: Benchmarking of varDiff(), var() and diff() on integer+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

All elements

> x <- data[["n = 100000"]]
> stats <- microbenchmark(varDiff = varDiff(x), var = var(x), diff = diff(x), unit = "ms")

Table: Benchmarking of varDiff(), var() and diff() on integer+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	var	0.383199	0.396390	0.4407531	0.4204120	0.4543865	0.608532
1	varDiff	0.463319	0.487432	0.5980133	0.5089645	0.5516310	6.296745
3	diff	0.888659	0.923009	1.1508867	0.9727050	1.0786070	7.790968

	expr	min	lq	mean	median	uq	max
2	var	1.000000	1.000000	1.000000	1.000000	1.000000	1.00000
1	varDiff	1.209082	1.229678	1.356799	1.210633	1.214013	10.34743
3	diff	2.319053	2.328538	2.611182	2.313695	2.373766	12.80289

Figure: Benchmarking of varDiff(), var() and diff() on integer+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

All elements

> x <- data[["n = 1000000"]]
> stats <- microbenchmark(varDiff = varDiff(x), var = var(x), diff = diff(x), unit = "ms")

Table: Benchmarking of varDiff(), var() and diff() on integer+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	var	4.041583	4.288999	5.469295	4.622446	5.887681	14.62338
1	varDiff	4.843936	5.389989	6.739284	5.786107	7.697941	14.49699
3	diff	9.188005	10.093262	15.969191	11.128606	17.095768	274.79025

	expr	min	lq	mean	median	uq	max
2	var	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
1	varDiff	1.198524	1.256701	1.232204	1.251741	1.307466	0.9913573
3	diff	2.273368	2.353291	2.919790	2.407514	2.903651	18.7911624

Figure: Benchmarking of varDiff(), var() and diff() on integer+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)
> data <- data[1:4]

Results

n = 1000 vector

All elements

> x <- data[["n = 1000"]]
> stats <- microbenchmark(varDiff = varDiff(x), var = var(x), diff = diff(x), unit = "ms")

Table: Benchmarking of varDiff(), var() and diff() on double+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	var	0.008009	0.0083305	0.0090801	0.0085575	0.0092100	0.038115
3	diff	0.011062	0.0120450	0.0130484	0.0125550	0.0131955	0.039849
1	varDiff	0.011718	0.0123075	0.0131388	0.0126810	0.0135545	0.030608

	expr	min	lq	mean	median	uq	max
2	var	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
3	diff	1.381196	1.445892	1.437032	1.467134	1.432736	1.0454939
1	varDiff	1.463104	1.477402	1.446985	1.481858	1.471715	0.8030434

Figure: Benchmarking of varDiff(), var() and diff() on double+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

All elements

> x <- data[["n = 10000"]]
> stats <- microbenchmark(varDiff = varDiff(x), var = var(x), diff = diff(x), unit = "ms")

Table: Benchmarking of varDiff(), var() and diff() on double+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	var	0.035529	0.0358335	0.0367964	0.0361120	0.0370675	0.068853
1	varDiff	0.045229	0.0459790	0.0475691	0.0465260	0.0474675	0.079934
3	diff	0.052307	0.0550845	0.0587569	0.0562325	0.0621470	0.091947

	expr	min	lq	mean	median	uq	max
2	var	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	varDiff	1.273016	1.283129	1.292764	1.288381	1.280569	1.160937
3	diff	1.472234	1.537235	1.596808	1.557169	1.676590	1.335410

Figure: Benchmarking of varDiff(), var() and diff() on double+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

All elements

> x <- data[["n = 100000"]]
> stats <- microbenchmark(varDiff = varDiff(x), var = var(x), diff = diff(x), unit = "ms")

Table: Benchmarking of varDiff(), var() and diff() on double+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	var	0.305579	0.3103610	0.3373515	0.3252020	0.3630905	0.420421
1	varDiff	0.387143	0.4148355	0.4605390	0.4550165	0.4769440	0.621061
3	diff	0.488172	0.5153175	0.9228420	0.5699755	0.6264650	7.441260

	expr	min	lq	mean	median	uq	max
2	var	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	varDiff	1.266916	1.336623	1.365161	1.399181	1.313568	1.477236
3	diff	1.597531	1.660381	2.735550	1.752681	1.725369	17.699544

Figure: Benchmarking of varDiff(), var() and diff() on double+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

All elements

> x <- data[["n = 1000000"]]
> stats <- microbenchmark(varDiff = varDiff(x), var = var(x), diff = diff(x), unit = "ms")

Table: Benchmarking of varDiff(), var() and diff() on double+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	var	3.100131	3.695467	3.876914	3.883719	4.073515	4.99572
1	varDiff	4.027280	4.660096	5.380448	5.038323	5.268857	16.74503
3	diff	6.217304	6.938252	13.940960	12.984295	13.872761	267.65223

	expr	min	lq	mean	median	uq	max
2	var	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	varDiff	1.299068	1.261030	1.387817	1.297294	1.293442	3.351876
3	diff	2.005497	1.877503	3.595891	3.343263	3.405600	53.576308

Figure: Benchmarking of varDiff(), var() and diff() on double+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3

Total processing time was 13.34 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('varDiff')

varDiff - HenrikBengtsson/matrixStats GitHub Wiki

varDiff() benchmarks

Alternative methods

Data type "integer"

Data

Results

n = 1000 vector

All elements

n = 10000 vector

All elements

n = 100000 vector

All elements

n = 1000000 vector

All elements

Data type "double"

Data

Results

n = 1000 vector

All elements

n = 10000 vector

All elements

n = 100000 vector

All elements

n = 1000000 vector

All elements

Appendix

Session information

Reproducibility

⚠️ GitHub.com Fallback ⚠️

varDiff - HenrikBengtsson/matrixStats GitHub Wiki

varDiff() benchmarks

Alternative methods

Data type "integer"

Data

Results

n = 1000 vector

All elements

n = 10000 vector

All elements

n = 100000 vector

All elements

n = 1000000 vector

All elements

Data type "double"

Data

Results

n = 1000 vector

All elements

n = 10000 vector

All elements

n = 100000 vector

All elements

n = 1000000 vector

All elements

Appendix

Session information

Reproducibility

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️