anyMissing - HenrikBengtsson/matrixStats GitHub Wiki
matrixStats: Benchmark report
This report benchmark the performance of anyMissing() against alternative methods.
- anyNA()
- any() + is.na()
as below
> any_is.na <- function(x) {
+     any(is.na(x))
+ }> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)> x <- data[["n = 1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3053217 163.1    5709258 305.0  5709258 305.0
Vcells 32116601 245.1   54055058 412.5 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.000359 | 0.000365 | 0.0004554 | 0.0003780 | 0.0003855 | 0.008082 | 
| 1 | anyMissing | 0.000934 | 0.000949 | 0.0010638 | 0.0009780 | 0.0010110 | 0.007846 | 
| 3 | any_is.na | 0.002283 | 0.002359 | 0.0025444 | 0.0024015 | 0.0024980 | 0.011989 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.0000000 | 
| 1 | anyMissing | 2.601671 | 2.600000 | 2.336181 | 2.587302 | 2.622568 | 0.9707993 | 
| 3 | any_is.na | 6.359331 | 6.463014 | 5.587509 | 6.353175 | 6.479896 | 1.4834199 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

> x <- data[["n = 10000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050100 162.9    5709258  305  5709258 305.0
Vcells 10479057  80.0   43244047  330 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.002735 | 0.0027680 | 0.0028186 | 0.0027845 | 0.0028250 | 0.004756 | 
| 1 | anyMissing | 0.005714 | 0.0057795 | 0.0059961 | 0.0058725 | 0.0059715 | 0.015570 | 
| 3 | any_is.na | 0.017282 | 0.0176325 | 0.0183324 | 0.0177625 | 0.0179375 | 0.039756 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | anyMissing | 2.089214 | 2.087970 | 2.127348 | 2.108996 | 2.113805 | 3.273760 | 
| 3 | any_is.na | 6.318830 | 6.370123 | 6.504126 | 6.379063 | 6.349558 | 8.359125 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

> x <- data[["n = 100000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050172 162.9    5709258  305  5709258 305.0
Vcells 10479617  80.0   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.026117 | 0.0261990 | 0.0263233 | 0.026240 | 0.0263320 | 0.028237 | 
| 1 | anyMissing | 0.052362 | 0.0524585 | 0.0534622 | 0.052539 | 0.0527015 | 0.088928 | 
| 3 | any_is.na | 0.165082 | 0.1665155 | 0.2059161 | 0.167865 | 0.2737035 | 0.319562 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | anyMissing | 2.004901 | 2.002309 | 2.030979 | 2.002248 | 2.001424 | 3.149343 | 
| 3 | any_is.na | 6.320864 | 6.355796 | 7.822567 | 6.397294 | 10.394330 | 11.317137 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

> x <- data[["n = 1000000"]]
> gc()
           used (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050244  163    5709258  305  5709258 305.0
Vcells 10479666   80   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.259394 | 0.271498 | 0.3292668 | 0.3153795 | 0.3550220 | 0.741998 | 
| 1 | anyMissing | 0.514958 | 0.521802 | 0.5666086 | 0.5361985 | 0.5619805 | 0.933253 | 
| 3 | any_is.na | 1.647475 | 2.685594 | 2.9601964 | 2.7435330 | 2.9737855 | 12.192548 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | anyMissing | 1.985235 | 1.921937 | 1.720819 | 1.700169 | 1.582946 | 1.257757 | 
| 3 | any_is.na | 6.351246 | 9.891763 | 8.990265 | 8.699148 | 8.376342 | 16.432050 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

> x <- data[["n = 10000000"]]
> gc()
           used (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050316  163    5709258  305  5709258 305.0
Vcells 10479714   80   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 3.192648 | 3.210021 | 3.290707 | 3.221774 | 3.278016 | 3.804310 | 
| 1 | anyMissing | 5.256905 | 5.275116 | 5.377196 | 5.310230 | 5.413356 | 6.648744 | 
| 3 | any_is.na | 26.604766 | 27.029503 | 29.775117 | 27.262952 | 30.022492 | 40.988404 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | anyMissing | 1.646566 | 1.643328 | 1.634055 | 1.648231 | 1.651412 | 1.747687 | 
| 3 | any_is.na | 8.333135 | 8.420352 | 9.048243 | 8.462092 | 9.158739 | 10.774202 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)> x <- data[["n = 1000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050391 163.0    5709258  305  5709258 305.0
Vcells 16035729 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.000505 | 0.0005205 | 0.0006173 | 0.0005375 | 0.0005515 | 0.008371 | 
| 1 | anyMissing | 0.000937 | 0.0009790 | 0.0011860 | 0.0010030 | 0.0010505 | 0.016060 | 
| 3 | any_is.na | 0.002244 | 0.0023690 | 0.0026005 | 0.0024175 | 0.0025095 | 0.015431 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | anyMissing | 1.855446 | 1.880884 | 1.921335 | 1.866046 | 1.904805 | 1.918528 | 
| 3 | any_is.na | 4.443564 | 4.551393 | 4.212717 | 4.497674 | 4.550317 | 1.843388 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

> x <- data[["n = 10000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050460 163.0    5709258  305  5709258 305.0
Vcells 16035771 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.004266 | 0.0042960 | 0.0043472 | 0.0043225 | 0.0043570 | 0.005866 | 
| 1 | anyMissing | 0.005720 | 0.0058035 | 0.0059647 | 0.0058815 | 0.0059995 | 0.011991 | 
| 3 | any_is.na | 0.017302 | 0.0175330 | 0.0179555 | 0.0176775 | 0.0178685 | 0.030253 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | anyMissing | 1.340834 | 1.350908 | 1.372075 | 1.360671 | 1.376980 | 2.044153 | 
| 3 | any_is.na | 4.055790 | 4.081238 | 4.130334 | 4.089647 | 4.101102 | 5.157347 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

> x <- data[["n = 100000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050532 163.0    5709258  305  5709258 305.0
Vcells 16036122 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.041876 | 0.0420325 | 0.0422432 | 0.0421405 | 0.0422655 | 0.044177 | 
| 1 | anyMissing | 0.052454 | 0.0525240 | 0.0531367 | 0.0526505 | 0.0529340 | 0.077419 | 
| 3 | any_is.na | 0.164455 | 0.1668220 | 0.2142914 | 0.1705240 | 0.2727585 | 0.282729 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | anyMissing | 1.252603 | 1.249605 | 1.257877 | 1.249404 | 1.252416 | 1.752473 | 
| 3 | any_is.na | 3.927190 | 3.968881 | 5.072808 | 4.046558 | 6.453455 | 6.399914 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

> x <- data[["n = 1000000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050604 163.0    5709258  305  5709258 305.0
Vcells 16036530 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 0.456717 | 0.5120315 | 0.5419385 | 0.5531520 | 0.5685105 | 0.685328 | 
| 1 | anyMissing | 0.545810 | 0.5804885 | 0.6082866 | 0.6047615 | 0.6236790 | 1.252541 | 
| 3 | any_is.na | 1.709426 | 2.7450395 | 2.7931772 | 2.8021115 | 2.8304415 | 9.515550 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 
| 1 | anyMissing | 1.195073 | 1.133697 | 1.122427 | 1.093301 | 1.097040 | 1.827652 | 
| 3 | any_is.na | 3.742856 | 5.361075 | 5.154048 | 5.065717 | 4.978697 | 13.884665 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

> x <- data[["n = 10000000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050676 163.0    5709258  305  5709258 305.0
Vcells 16036578 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 5.540831 | 5.668875 | 5.864548 | 5.749500 | 6.116903 | 7.339590 | 
| 1 | anyMissing | 5.925187 | 6.040297 | 6.211277 | 6.157669 | 6.256903 | 7.233605 | 
| 3 | any_is.na | 27.431948 | 28.496416 | 33.181975 | 28.905825 | 35.312683 | 250.630057 | 
| expr | min | lq | mean | median | uq | max | |
|---|---|---|---|---|---|---|---|
| 2 | anyNA | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.0000000 | 
| 1 | anyMissing | 1.069368 | 1.065520 | 1.059123 | 1.070992 | 1.022887 | 0.9855598 | 
| 3 | any_is.na | 4.950873 | 5.026821 | 5.658062 | 5.027537 | 5.772968 | 34.1476918 | 
Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     
loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         Total processing time was 17.93 secs.
To reproduce this report, do:
html <- matrixStats:::benchmark('anyMissing')Copyright Henrik Bengtsson. Last updated on 2019-09-10 20:33:53 (-0700 UTC). Powered by RSP.
<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "" document.getElementsByTagName('head')[0].appendChild(link); </script>