matrixStats: Benchmark report

madDiff() benchmarks on subsetted computation

This report benchmark the performance of madDiff() on subsetted computation.

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3234356 172.8    5709258 305.0  5709258 305.0
Vcells 12934169  98.7   24515964 187.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	0.055190	0.0564065	0.0574278	0.0568090	0.0574705	0.077206
3	madDiff(x[idxs])	0.057079	0.0582120	0.0618246	0.0588055	0.0595500	0.307727
2	madDiff(x, idxs)	0.057541	0.0586050	0.0598237	0.0589665	0.0597920	0.088428

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3	madDiff(x[idxs])	1.034227	1.032009	1.076562	1.035144	1.036184	3.985791
2	madDiff(x, idxs)	1.042598	1.038976	1.041720	1.037978	1.040395	1.145351

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3232097 172.7    5709258 305.0  5709258 305.0
Vcells 11805394  90.1   24515964 187.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	0.224246	0.231654	0.2367327	0.2368285	0.2414330	0.276658
3	madDiff(x[idxs])	0.235690	0.242172	0.2499910	0.2497330	0.2546955	0.346952
2	madDiff(x, idxs)	0.236465	0.242802	0.2500311	0.2499390	0.2556420	0.275142

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
3	madDiff(x[idxs])	1.051033	1.045404	1.056005	1.054489	1.054932	1.2540827
2	madDiff(x, idxs)	1.054489	1.048123	1.056175	1.055359	1.058853	0.9945203

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3232169 172.7    5709258 305.0  5709258 305.0
Vcells 11868954  90.6   24515964 187.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	2.095817	2.243695	2.389170	2.282126	2.348625	8.611197
2	madDiff(x, idxs)	2.230470	2.363672	2.647164	2.423176	2.512975	8.659484
3	madDiff(x[idxs])	2.215638	2.397924	2.642880	2.446655	2.492496	8.566503

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
2	madDiff(x, idxs)	1.064248	1.053473	1.107985	1.061806	1.069977	1.0056075
3	madDiff(x[idxs])	1.057171	1.068738	1.106192	1.072095	1.061257	0.9948098

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3232241 172.7    5709258 305.0  5709258 305.0
Vcells 12499003  95.4   24515964 187.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	18.44959	18.95174	24.86004	20.37979	25.57697	292.22811
2	madDiff(x, idxs)	21.57473	22.66015	31.62577	24.88276	29.63818	288.79574
3	madDiff(x[idxs])	21.43227	22.80355	27.10503	27.99392	29.61907	49.22115

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
2	madDiff(x, idxs)	1.169388	1.195676	1.272153	1.220953	1.158784	0.9882545
3	madDiff(x[idxs])	1.161666	1.203243	1.090305	1.373612	1.158037	0.1684340

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3232310 172.7    5709258 305.0  5709258 305.0
Vcells 18799046 143.5   29499156 225.1 57084605 435.6
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	173.4395	192.0317	233.7796	200.0665	221.1509	524.2280
2	madDiff(x, idxs)	290.4968	320.6389	387.9749	334.1310	364.1433	668.4249
3	madDiff(x[idxs])	303.8622	325.1975	387.9358	335.7142	368.6222	651.9459

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	madDiff(x, idxs)	1.674917	1.669719	1.659576	1.670099	1.646583	1.275065
3	madDiff(x[idxs])	1.751978	1.693458	1.659409	1.678013	1.666836	1.243631

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on integer+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used (Mb)
Ncells  3232391 172.7    5709258  305  5709258  305
Vcells 17356258 132.5   61598888  470 61463574  469
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	0.070199	0.0715985	0.0727082	0.0722565	0.0730440	0.094207
3	madDiff(x[idxs])	0.072466	0.0737975	0.0762348	0.0743830	0.0751275	0.218112
2	madDiff(x, idxs)	0.072472	0.0740670	0.0748416	0.0745635	0.0752430	0.084222

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3	madDiff(x[idxs])	1.032294	1.030713	1.048503	1.029430	1.028524	2.315242
2	madDiff(x, idxs)	1.032379	1.034477	1.029341	1.031928	1.030105	0.894010

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used (Mb)
Ncells  3232457 172.7    5709258  305  5709258  305
Vcells 17365745 132.5   61598888  470 61463574  469
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	0.317826	0.3262430	0.3311348	0.3301385	0.3354795	0.353511
2	madDiff(x, idxs)	0.331684	0.3391555	0.3455184	0.3444140	0.3505610	0.370233
3	madDiff(x[idxs])	0.332179	0.3394500	0.3471538	0.3448690	0.3507025	0.464242

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	madDiff(x, idxs)	1.043602	1.039579	1.043437	1.043241	1.044955	1.047303
3	madDiff(x[idxs])	1.045160	1.040482	1.048376	1.044619	1.045377	1.313232

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used (Mb)
Ncells  3232529 172.7    5709258  305  5709258  305
Vcells 17460624 133.3   61598888  470 61463574  469
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	2.950452	3.116595	3.321669	3.177805	3.455529	4.418886
3	madDiff(x[idxs])	3.139160	3.289204	3.657241	3.368537	3.667559	10.142200
2	madDiff(x, idxs)	3.098777	3.283048	3.520841	3.370732	3.456164	8.919457

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3	madDiff(x[idxs])	1.063959	1.055384	1.101025	1.060020	1.061360	2.295194
2	madDiff(x, idxs)	1.050272	1.053409	1.059962	1.060711	1.000184	2.018485

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3232601 172.7    5709258  305  5709258 305.0
Vcells 18406067 140.5   61598888  470 61572274 469.8
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	25.79942	27.48665	30.40725	28.45360	32.35795	57.97113
3	madDiff(x[idxs])	34.84448	37.06804	42.82830	38.03059	43.09852	321.29250
2	madDiff(x, idxs)	35.60745	37.15369	40.41188	38.27959	43.03884	58.87006

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3	madDiff(x[idxs])	1.350591	1.348583	1.408490	1.336583	1.331930	5.542284
2	madDiff(x, idxs)	1.380165	1.351699	1.329021	1.345334	1.330085	1.015506

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> idxs <- sample.int(length(x), size = length(x) * 0.7)
> x_S <- x[idxs]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3232673 172.7    5709258  305  5709258 305.0
Vcells 27856115 212.6   61598888  470 61572274 469.8
> stats <- microbenchmark(madDiff_x_S = madDiff(x_S), `madDiff(x, idxs)` = madDiff(x, idxs = idxs), 
+     `madDiff(x[idxs])` = madDiff(x[idxs]), unit = "ms")

Table: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	302.1222	330.8240	395.3541	347.1689	370.4564	618.6347
2	madDiff(x, idxs)	466.7321	496.3730	546.1398	506.1658	522.6195	786.5121
3	madDiff(x[idxs])	475.4246	500.2915	597.8230	516.8004	755.2356	793.2776

	expr	min	lq	mean	median	uq	max
1	madDiff_x_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	madDiff(x, idxs)	1.544846	1.500414	1.381394	1.457981	1.410745	1.271367
3	madDiff(x[idxs])	1.573617	1.512259	1.512120	1.488614	2.038663	1.282304

Figure: Benchmarking of madDiff_x_S(), madDiff(x, idxs)() and madDiff(x[idxs])() on double+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3

Total processing time was 4.82 mins.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('madDiff_subset')

madDiff_subset - HenrikBengtsson/matrixStats GitHub Wiki

madDiff() benchmarks on subsetted computation

Data type "integer"

Data

Results

n = 1000 vector

n = 10000 vector

n = 100000 vector

n = 1000000 vector

n = 10000000 vector

Data type "double"

Data

Results

n = 1000 vector

n = 10000 vector

n = 100000 vector

n = 1000000 vector

n = 10000000 vector

Appendix

Session information

Reproducibility

⚠️ GitHub.com Fallback ⚠️

madDiff_subset - HenrikBengtsson/matrixStats GitHub Wiki

madDiff() benchmarks on subsetted computation

Data type "integer"

Data

Results

n = 1000 vector

n = 10000 vector

n = 100000 vector

n = 1000000 vector

n = 10000000 vector

Data type "double"

Data

Results

n = 1000 vector

n = 10000 vector

n = 100000 vector

n = 1000000 vector

n = 10000000 vector

Appendix

Session information

Reproducibility

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️