matrixStats: Benchmark report

madDiff() benchmarks

This report benchmark the performance of madDiff() against alternative methods.

Alternative methods

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)
> data <- data[1:4]

Results

n = 1000 vector

All elements

> x <- data[["n = 1000"]]
> stats <- microbenchmark(madDiff = madDiff(x), mad = mad(x), diff = diff(x), unit = "ms")

Table: Benchmarking of madDiff(), mad() and diff() on integer+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
3	diff	0.013223	0.0155665	0.0172980	0.0175410	0.0183605	0.036315
1	madDiff	0.064234	0.0699125	0.0724065	0.0723895	0.0744940	0.094211
2	mad	0.082923	0.0861560	0.0895827	0.0884875	0.0894400	0.243823

	expr	min	lq	mean	median	uq	max
3	diff	1.000000	1.000000	1.00000	1.000000	1.000000	1.000000
1	madDiff	4.857748	4.491215	4.18584	4.126874	4.057297	2.594272
2	mad	6.271119	5.534706	5.17880	5.044610	4.871327	6.714113

Figure: Benchmarking of madDiff(), mad() and diff() on integer+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

All elements

> x <- data[["n = 10000"]]
> stats <- microbenchmark(madDiff = madDiff(x), mad = mad(x), diff = diff(x), unit = "ms")

Table: Benchmarking of madDiff(), mad() and diff() on integer+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
3	diff	0.094262	0.1025480	0.1055194	0.1045160	0.106442	0.158033
1	madDiff	0.302049	0.3080060	0.3127393	0.3107785	0.314547	0.352371
2	mad	0.413185	0.4217135	0.4304797	0.4260205	0.431823	0.534026

	expr	min	lq	mean	median	uq	max
3	diff	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	madDiff	3.204356	3.003530	2.963807	2.973502	2.955102	2.229731
2	mad	4.383368	4.112352	4.079624	4.076127	4.056885	3.379206

Figure: Benchmarking of madDiff(), mad() and diff() on integer+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

All elements

> x <- data[["n = 100000"]]
> stats <- microbenchmark(madDiff = madDiff(x), mad = mad(x), diff = diff(x), unit = "ms")

Table: Benchmarking of madDiff(), mad() and diff() on integer+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
3	diff	0.909217	0.9403125	3.663575	0.9966065	1.475936	245.250667
1	madDiff	2.428881	2.5457240	2.876464	2.5967320	3.291117	9.349433
2	mad	3.364384	3.5004400	3.890345	3.6280150	3.969575	14.719828

	expr	min	lq	mean	median	uq	max
3	diff	1.000000	1.000000	1.0000000	1.000000	1.000000	1.0000000
1	madDiff	2.671399	2.707317	0.7851522	2.605574	2.229852	0.0381219
2	mad	3.700309	3.722635	1.0618985	3.640369	2.689532	0.0600195

Figure: Benchmarking of madDiff(), mad() and diff() on integer+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

All elements

> x <- data[["n = 1000000"]]
> stats <- microbenchmark(madDiff = madDiff(x), mad = mad(x), diff = diff(x), unit = "ms")

Table: Benchmarking of madDiff(), mad() and diff() on integer+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
3	diff	9.392514	9.910151	12.87988	10.90928	16.40246	24.23485
1	madDiff	28.923195	29.745172	33.53503	30.83414	37.18410	53.47256
2	mad	34.370533	35.100619	44.81935	42.43200	43.50615	290.14681

	expr	min	lq	mean	median	uq	max
3	diff	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	madDiff	3.079388	3.001485	2.603675	2.826413	2.266983	2.206432
2	mad	3.659354	3.541885	3.479796	3.889531	2.652416	11.972294

Figure: Benchmarking of madDiff(), mad() and diff() on integer+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)
> data <- data[1:4]

Results

n = 1000 vector

All elements

> x <- data[["n = 1000"]]
> stats <- microbenchmark(madDiff = madDiff(x), mad = mad(x), diff = diff(x), unit = "ms")

Table: Benchmarking of madDiff(), mad() and diff() on double+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
3	diff	0.011542	0.0126775	0.0135488	0.0134575	0.0140570	0.026035
1	madDiff	0.084308	0.0853555	0.0871661	0.0866470	0.0880385	0.106187
2	mad	0.091454	0.0931040	0.0955540	0.0938460	0.0950685	0.211817

	expr	min	lq	mean	median	uq	max
3	diff	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	madDiff	7.304453	6.732834	6.433509	6.438566	6.262965	4.078625
2	mad	7.923583	7.344035	7.052603	6.973509	6.763072	8.135856

Figure: Benchmarking of madDiff(), mad() and diff() on double+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

All elements

> x <- data[["n = 10000"]]
> stats <- microbenchmark(madDiff = madDiff(x), mad = mad(x), diff = diff(x), unit = "ms")

Table: Benchmarking of madDiff(), mad() and diff() on double+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
3	diff	0.054165	0.0665120	0.0836105	0.0679370	0.071182	0.182388
2	mad	0.454486	0.4621760	0.4938613	0.4683055	0.489220	0.634110
1	madDiff	0.509943	0.5244515	0.5622894	0.5309720	0.555209	0.696625

	expr	min	lq	mean	median	uq	max
3	diff	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	mad	8.390769	6.948761	5.906690	6.893232	6.872805	3.476709
1	madDiff	9.414622	7.885066	6.725106	7.815653	7.799851	3.819467

Figure: Benchmarking of madDiff(), mad() and diff() on double+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

All elements

> x <- data[["n = 100000"]]
> stats <- microbenchmark(madDiff = madDiff(x), mad = mad(x), diff = diff(x), unit = "ms")

Table: Benchmarking of madDiff(), mad() and diff() on double+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
3	diff	0.499849	0.530051	0.8593856	0.556828	0.5969075	6.907790
1	madDiff	3.395983	3.496602	3.6983341	3.573189	3.6917735	9.207087
2	mad	4.347775	4.518637	4.8528171	4.596398	4.8296455	10.655482

	expr	min	lq	mean	median	uq	max
3	diff	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	madDiff	6.794018	6.596726	4.303463	6.417043	6.184833	1.332856
2	mad	8.698177	8.524911	5.646845	8.254610	8.091112	1.542531

Figure: Benchmarking of madDiff(), mad() and diff() on double+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

All elements

> x <- data[["n = 1000000"]]
> stats <- microbenchmark(madDiff = madDiff(x), mad = mad(x), diff = diff(x), unit = "ms")

Table: Benchmarking of madDiff(), mad() and diff() on double+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
3	diff	5.976937	6.835599	12.88865	9.304882	12.65240	253.96754
2	mad	36.488619	37.669303	47.69525	40.762867	47.01094	299.75383
1	madDiff	36.473325	39.167698	44.67975	45.859020	48.35748	61.25575

	expr	min	lq	mean	median	uq	max
3	diff	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
2	mad	6.104903	5.510753	3.700562	4.380804	3.715573	1.1802840
1	madDiff	6.102344	5.729958	3.466596	4.928490	3.821999	0.2411952

Figure: Benchmarking of madDiff(), mad() and diff() on double+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3

Total processing time was 29.77 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('madDiff')

madDiff - HenrikBengtsson/matrixStats GitHub Wiki

madDiff() benchmarks

Alternative methods

Data type "integer"

Data

Results

n = 1000 vector

All elements

n = 10000 vector

All elements

n = 100000 vector

All elements

n = 1000000 vector

All elements

Data type "double"

Data

Results

n = 1000 vector

All elements

n = 10000 vector

All elements

n = 100000 vector

All elements

n = 1000000 vector

All elements

Appendix

Session information

Reproducibility

⚠️ GitHub.com Fallback ⚠️

madDiff - HenrikBengtsson/matrixStats GitHub Wiki

madDiff() benchmarks

Alternative methods

Data type "integer"

Data

Results

n = 1000 vector

All elements

n = 10000 vector

All elements

n = 100000 vector

All elements

n = 1000000 vector

All elements

Data type "double"

Data

Results

n = 1000 vector

All elements

n = 10000 vector

All elements

n = 100000 vector

All elements

n = 1000000 vector

All elements

Appendix

Session information

Reproducibility

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️