anyMissing - HenrikBengtsson/matrixStats GitHub Wiki

matrixStats: Benchmark report


anyMissing() benchmarks

This report benchmark the performance of anyMissing() against alternative methods.

Alternative methods

  • anyNA()
  • any() + is.na()

as below

> any_is.na <- function(x) {
+     any(is.na(x))
+ }

Data type "integer"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> gc()
           used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  3053217 163.1    5709258 305.0  5709258 305.0
Vcells 32116601 245.1   54055058 412.5 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.000359 0.000365 0.0004554 0.0003780 0.0003855 0.008082
1 anyMissing 0.000934 0.000949 0.0010638 0.0009780 0.0010110 0.007846
3 any_is.na 0.002283 0.002359 0.0025444 0.0024015 0.0024980 0.011989
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
1 anyMissing 2.601671 2.600000 2.336181 2.587302 2.622568 0.9707993
3 any_is.na 6.359331 6.463014 5.587509 6.353175 6.479896 1.4834199

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050100 162.9    5709258  305  5709258 305.0
Vcells 10479057  80.0   43244047  330 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.002735 0.0027680 0.0028186 0.0027845 0.0028250 0.004756
1 anyMissing 0.005714 0.0057795 0.0059961 0.0058725 0.0059715 0.015570
3 any_is.na 0.017282 0.0176325 0.0183324 0.0177625 0.0179375 0.039756
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 anyMissing 2.089214 2.087970 2.127348 2.108996 2.113805 3.273760
3 any_is.na 6.318830 6.370123 6.504126 6.379063 6.349558 8.359125

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050172 162.9    5709258  305  5709258 305.0
Vcells 10479617  80.0   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.026117 0.0261990 0.0263233 0.026240 0.0263320 0.028237
1 anyMissing 0.052362 0.0524585 0.0534622 0.052539 0.0527015 0.088928
3 any_is.na 0.165082 0.1665155 0.2059161 0.167865 0.2737035 0.319562
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 anyMissing 2.004901 2.002309 2.030979 2.002248 2.001424 3.149343
3 any_is.na 6.320864 6.355796 7.822567 6.397294 10.394330 11.317137

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> gc()
           used (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050244  163    5709258  305  5709258 305.0
Vcells 10479666   80   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.259394 0.271498 0.3292668 0.3153795 0.3550220 0.741998
1 anyMissing 0.514958 0.521802 0.5666086 0.5361985 0.5619805 0.933253
3 any_is.na 1.647475 2.685594 2.9601964 2.7435330 2.9737855 12.192548
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 anyMissing 1.985235 1.921937 1.720819 1.700169 1.582946 1.257757
3 any_is.na 6.351246 9.891763 8.990265 8.699148 8.376342 16.432050

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> gc()
           used (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050316  163    5709258  305  5709258 305.0
Vcells 10479714   80   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 3.192648 3.210021 3.290707 3.221774 3.278016 3.804310
1 anyMissing 5.256905 5.275116 5.377196 5.310230 5.413356 6.648744
3 any_is.na 26.604766 27.029503 29.775117 27.262952 30.022492 40.988404
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 anyMissing 1.646566 1.643328 1.634055 1.648231 1.651412 1.747687
3 any_is.na 8.333135 8.420352 9.048243 8.462092 9.158739 10.774202

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on integer+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Data type "double"

Data

> rvector <- function(n, mode = c("logical", "double", "integer"), range = c(-100, +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     x
+ }
> rvectors <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rvector(n = scale * 100, ...)
+     data[[2]] <- rvector(n = scale * 1000, ...)
+     data[[3]] <- rvector(n = scale * 10000, ...)
+     data[[4]] <- rvector(n = scale * 1e+05, ...)
+     data[[5]] <- rvector(n = scale * 1e+06, ...)
+     names(data) <- sprintf("n = %d", sapply(data, FUN = length))
+     data
+ }
> data <- rvectors(mode = mode)

Results

n = 1000 vector

> x <- data[["n = 1000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050391 163.0    5709258  305  5709258 305.0
Vcells 16035729 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.000505 0.0005205 0.0006173 0.0005375 0.0005515 0.008371
1 anyMissing 0.000937 0.0009790 0.0011860 0.0010030 0.0010505 0.016060
3 any_is.na 0.002244 0.0023690 0.0026005 0.0024175 0.0025095 0.015431
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 anyMissing 1.855446 1.880884 1.921335 1.866046 1.904805 1.918528
3 any_is.na 4.443564 4.551393 4.212717 4.497674 4.550317 1.843388

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 1000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000 vector

> x <- data[["n = 10000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050460 163.0    5709258  305  5709258 305.0
Vcells 16035771 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 10000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.004266 0.0042960 0.0043472 0.0043225 0.0043570 0.005866
1 anyMissing 0.005720 0.0058035 0.0059647 0.0058815 0.0059995 0.011991
3 any_is.na 0.017302 0.0175330 0.0179555 0.0176775 0.0178685 0.030253
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 anyMissing 1.340834 1.350908 1.372075 1.360671 1.376980 2.044153
3 any_is.na 4.055790 4.081238 4.130334 4.089647 4.101102 5.157347

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 10000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 100000 vector

> x <- data[["n = 100000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050532 163.0    5709258  305  5709258 305.0
Vcells 16036122 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 100000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.041876 0.0420325 0.0422432 0.0421405 0.0422655 0.044177
1 anyMissing 0.052454 0.0525240 0.0531367 0.0526505 0.0529340 0.077419
3 any_is.na 0.164455 0.1668220 0.2142914 0.1705240 0.2727585 0.282729
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 anyMissing 1.252603 1.249605 1.257877 1.249404 1.252416 1.752473
3 any_is.na 3.927190 3.968881 5.072808 4.046558 6.453455 6.399914

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 100000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 1000000 vector

> x <- data[["n = 1000000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050604 163.0    5709258  305  5709258 305.0
Vcells 16036530 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 1000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 0.456717 0.5120315 0.5419385 0.5531520 0.5685105 0.685328
1 anyMissing 0.545810 0.5804885 0.6082866 0.6047615 0.6236790 1.252541
3 any_is.na 1.709426 2.7450395 2.7931772 2.8021115 2.8304415 9.515550
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 anyMissing 1.195073 1.133697 1.122427 1.093301 1.097040 1.827652
3 any_is.na 3.742856 5.361075 5.154048 5.065717 4.978697 13.884665

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 1000000 data. Outliers are displayed as crosses. Times are in milliseconds.

n = 10000000 vector

> x <- data[["n = 10000000"]]
> gc()
           used  (Mb) gc trigger (Mb) max used  (Mb)
Ncells  3050676 163.0    5709258  305  5709258 305.0
Vcells 16036578 122.4   34595238  264 56666022 432.4
> stats <- microbenchmark(anyMissing = anyMissing(x), anyNA = anyNA(x), any_is.na = any_is.na(x), unit = "ms")

Table: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 10000000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

expr min lq mean median uq max
2 anyNA 5.540831 5.668875 5.864548 5.749500 6.116903 7.339590
1 anyMissing 5.925187 6.040297 6.211277 6.157669 6.256903 7.233605
3 any_is.na 27.431948 28.496416 33.181975 28.905825 35.312683 250.630057
expr min lq mean median uq max
2 anyNA 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000
1 anyMissing 1.069368 1.065520 1.059123 1.070992 1.022887 0.9855598
3 any_is.na 4.950873 5.026821 5.658062 5.027537 5.772968 34.1476918

Figure: Benchmarking of anyMissing(), anyNA() and any_is.na() on double+n = 10000000 data. Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3         

Total processing time was 17.93 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('anyMissing')

Copyright Henrik Bengtsson. Last updated on 2019-09-10 20:33:53 (-0700 UTC). Powered by RSP.

<script> var link = document.createElement('link'); link.rel = 'icon'; link.href = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAMAAABEpIrGAAAA21BMVEUAAAAAAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8AAP8BAf4CAv0DA/wdHeIeHuEfH+AgIN8hId4lJdomJtknJ9g+PsE/P8BAQL9yco10dIt1dYp3d4h4eIeVlWqWlmmXl2iYmGeZmWabm2Tn5xjo6Bfp6Rb39wj4+Af//wA2M9hbAAAASXRSTlMAAQIJCgsMJSYnKD4/QGRlZmhpamtsbautrrCxuru8y8zN5ebn6Pn6+///////////////////////////////////////////LsUNcQAAAS9JREFUOI29k21XgkAQhVcFytdSMqMETU26UVqGmpaiFbL//xc1cAhhwVNf6n5i5z67M2dmYOyfJZUqlVLhkKucG7cgmUZTybDz6g0iDeq51PUr37Ds2cy2/C9NeES5puDjxuUk1xnToZsg8pfA3avHQ3lLIi7iWRrkv/OYtkScxBIMgDee0ALoyxHQBJ68JLCjOtQIMIANF7QG9G9fNnHvisCHBVMKgSJgiz7nE+AoBKrAPA3MgepvgR9TSCasrCKH0eB1wBGBFdCO+nAGjMVGPcQb5bd6mQRegN6+1axOs9nGfYcCtfi4NQosdtH7dB+txFIpXQqN1p9B/asRHToyS0jRgpV7nk4nwcq1BJ+x3Gl/v7S9Wmpp/aGquum7w3ZDyrADFYrl8vHBH+ev9AUASW1dmU4h4wAAAABJRU5ErkJggg==" document.getElementsByTagName('head')[0].appendChild(link); </script>
⚠️ **GitHub.com Fallback** ⚠️