matrixStats: Benchmark report

colTabulates() and rowTabulates() benchmarks on subsetted computation

This report benchmark the performance of colTabulates() and rowTabulates() on subsetted computation.

Data

> rmatrix <- function(nrow, ncol, mode = c("logical", "double", "integer", "index"), range = c(-100, 
+     +100), na_prob = 0) {
+     mode <- match.arg(mode)
+     n <- nrow * ncol
+     if (mode == "logical") {
+         x <- sample(c(FALSE, TRUE), size = n, replace = TRUE)
+     }     else if (mode == "index") {
+         x <- seq_len(n)
+         mode <- "integer"
+     }     else {
+         x <- runif(n, min = range[1], max = range[2])
+     }
+     storage.mode(x) <- mode
+     if (na_prob > 0) 
+         x[sample(n, size = na_prob * n)] <- NA
+     dim(x) <- c(nrow, ncol)
+     x
+ }
> rmatrices <- function(scale = 10, seed = 1, ...) {
+     set.seed(seed)
+     data <- list()
+     data[[1]] <- rmatrix(nrow = scale * 1, ncol = scale * 1, ...)
+     data[[2]] <- rmatrix(nrow = scale * 10, ncol = scale * 10, ...)
+     data[[3]] <- rmatrix(nrow = scale * 100, ncol = scale * 1, ...)
+     data[[4]] <- t(data[[3]])
+     data[[5]] <- rmatrix(nrow = scale * 10, ncol = scale * 100, ...)
+     data[[6]] <- t(data[[5]])
+     names(data) <- sapply(data, FUN = function(x) paste(dim(x), collapse = "x"))
+     data
+ }
> data <- rmatrices(mode = "integer", range = c(-10, 10))

Results

10x10 matrix

> X <- data[["10x10"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3205801 171.3    5709258 305.0  5709258 305.0
Vcells 6502589  49.7   22343563 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204525 171.2    5709258 305.0  5709258 305.0
Vcells 6498843  49.6   22343563 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 10x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	0.087540	0.0889065	0.0980791	0.0896510	0.0908495	0.429847
2	colTabulates(X, rows, cols)	0.089037	0.0903495	0.0971355	0.0909390	0.0921390	0.219770
3	colTabulates(X[rows, cols])	0.089401	0.0902440	0.1008590	0.0912245	0.0924605	0.288946

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.0000000	1.000000	1.000000	1.0000000
2	colTabulates(X, rows, cols)	1.017101	1.016231	0.9903792	1.014367	1.014194	0.5112749
3	colTabulates(X[rows, cols])	1.021259	1.015044	1.0283440	1.017551	1.017733	0.6722066

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 10x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	0.085704	0.0866480	0.0883837	0.0874370	0.0881800	0.102645
3	rowTabulates(X[cols, rows])	0.086742	0.0880295	0.0892940	0.0885225	0.0892075	0.130317
2	rowTabulates(X, cols, rows)	0.086714	0.0881100	0.0931348	0.0885260	0.0894770	0.420335

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
3	rowTabulates(X[cols, rows])	1.012112	1.015944	1.010300	1.012415	1.011652	1.269589
2	rowTabulates(X, cols, rows)	1.011785	1.016873	1.053755	1.012455	1.014709	4.095036

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 10x10 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 10x10 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
2	rowTabulates_X_S	85.704	86.6480	88.38371	87.437	88.1800	102.645
1	colTabulates_X_S	87.540	88.9065	98.07909	89.651	90.8495	429.847

	expr	min	lq	mean	median	uq	max
2	rowTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	colTabulates_X_S	1.021423	1.026065	1.109697	1.025321	1.030273	4.187705

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 10x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

100x100 matrix

> X <- data[["100x100"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3203486 171.1    5709258 305.0  5709258 305.0
Vcells 6167977  47.1   22343563 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3203477 171.1    5709258 305.0  5709258 305.0
Vcells 6173055  47.1   22343563 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 100x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	0.253419	0.2609700	0.2726863	0.2656830	0.280109	0.362675
2	colTabulates(X, rows, cols)	0.262151	0.2719990	0.2841614	0.2791645	0.293225	0.412909
3	colTabulates(X[rows, cols])	0.261782	0.2737665	0.2917133	0.2794695	0.294835	0.622681

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	colTabulates(X, rows, cols)	1.034457	1.042262	1.042082	1.050743	1.046825	1.138510
3	colTabulates(X[rows, cols])	1.033001	1.049034	1.069776	1.051891	1.052572	1.716912

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 100x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	0.276059	0.2857250	0.2933003	0.2923040	0.2996070	0.335534
2	rowTabulates(X, cols, rows)	0.285986	0.2949865	0.3022689	0.3019410	0.3090690	0.323716
3	rowTabulates(X[cols, rows])	0.283183	0.2971275	0.3120059	0.3023895	0.3084625	1.172120

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
2	rowTabulates(X, cols, rows)	1.035960	1.032414	1.030578	1.032969	1.031581	0.9647785
3	rowTabulates(X[cols, rows])	1.025806	1.039907	1.063776	1.034504	1.029557	3.4932973

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 100x100 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 100x100 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	253.419	260.970	272.6863	265.683	280.109	362.675
2	rowTabulates_X_S	276.059	285.725	293.3003	292.304	299.607	335.534

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
2	rowTabulates_X_S	1.089338	1.094858	1.075596	1.100198	1.069609	0.9251644

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 100x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

1000x10 matrix

> X <- data[["1000x10"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204238 171.2    5709258 305.0  5709258 305.0
Vcells 6172045  47.1   22343563 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204229 171.2    5709258 305.0  5709258 305.0
Vcells 6177123  47.2   22343563 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 1000x10 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	0.233580	0.2362520	0.2447179	0.2401835	0.2475065	0.329554
3	colTabulates(X[rows, cols])	0.241319	0.2446845	0.2515279	0.2484270	0.2547075	0.311018
2	colTabulates(X, rows, cols)	0.241028	0.2443960	0.2583131	0.2500805	0.2600480	0.421301

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
3	colTabulates(X[rows, cols])	1.033132	1.035693	1.027828	1.034322	1.029094	0.9437543
2	colTabulates(X, rows, cols)	1.031886	1.034472	1.055554	1.041206	1.050671	1.2783975

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 1000x10 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	0.276673	0.2801780	0.2889631	0.2851215	0.2919005	0.452778
3	rowTabulates(X[cols, rows])	0.287231	0.2913165	0.3008558	0.2962890	0.3055805	0.403375
2	rowTabulates(X, cols, rows)	0.286890	0.2910215	0.2980739	0.2965630	0.3026750	0.340465

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
3	rowTabulates(X[cols, rows])	1.038161	1.039755	1.041157	1.039168	1.046865	0.8908891
2	rowTabulates(X, cols, rows)	1.036928	1.038702	1.031529	1.040129	1.036912	0.7519469

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 1000x10 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 1000x10 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	233.580	236.252	244.7179	240.1835	247.5065	329.554
2	rowTabulates_X_S	276.673	280.178	288.9631	285.1215	291.9005	452.778

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	rowTabulates_X_S	1.184489	1.185929	1.180801	1.187099	1.179365	1.373911

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 1000x10 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

10x1000 matrix

> X <- data[["10x1000"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204444 171.2    5709258 305.0  5709258 305.0
Vcells 6172952  47.1   22343563 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204435 171.2    5709258 305.0  5709258 305.0
Vcells 6178030  47.2   22345455 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 10x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	0.297480	0.3047505	2.8225353	0.3099065	0.3148250	250.891787
3	colTabulates(X[rows, cols])	0.304955	0.3150990	0.3232788	0.3190925	0.3238450	0.573301
2	colTabulates(X, rows, cols)	0.305951	0.3151110	0.3291838	0.3212190	0.3255665	0.693542

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.0000000	1.000000	1.000000	1.0000000
3	colTabulates(X[rows, cols])	1.025128	1.033957	0.1145349	1.029641	1.028651	0.0022851
2	colTabulates(X, rows, cols)	1.028476	1.033997	0.1166270	1.036503	1.034119	0.0027643

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 10x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	0.291203	0.3048035	0.3166833	0.3105615	0.3189695	0.485174
3	rowTabulates(X[cols, rows])	0.302955	0.3157510	0.3264460	0.3193135	0.3318315	0.524493
2	rowTabulates(X, cols, rows)	0.301042	0.3162710	0.3273503	0.3213555	0.3335205	0.444905

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.0000000
3	rowTabulates(X[cols, rows])	1.040357	1.035917	1.030828	1.028181	1.040324	1.0810410
2	rowTabulates(X, cols, rows)	1.033787	1.037623	1.033683	1.034756	1.045619	0.9170009

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 10x1000 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 10x1000 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	297.480	304.7505	2822.5354	309.9065	314.8250	250891.787
2	rowTabulates_X_S	291.203	304.8035	316.6833	310.5615	318.9695	485.174

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.0000000	1.000000	1.0000000	1.000000	1.000000	1.0000000
2	rowTabulates_X_S	0.9788994	1.000174	0.1121982	1.002114	1.013165	0.0019338

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 10x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

100x1000 matrix

> X <- data[["100x1000"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204643 171.2    5709258 305.0  5709258 305.0
Vcells 6195618  47.3   22345455 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204634 171.2    5709258 305.0  5709258 305.0
Vcells 6245696  47.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 100x1000 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.567425	1.639608	1.705160	1.655323	1.771980	2.380615
2	colTabulates(X, rows, cols)	1.645321	1.694262	1.870166	1.728113	1.860080	8.493609
3	colTabulates(X[rows, cols])	1.642920	1.715765	4.394435	1.736148	1.947884	252.808727

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	colTabulates(X, rows, cols)	1.049697	1.033334	1.096768	1.043974	1.049718	3.567821
3	colTabulates(X[rows, cols])	1.048165	1.046449	2.577139	1.048828	1.099269	106.194713

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 100x1000 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	1.748535	1.842582	1.994827	1.887706	1.920214	8.649732
2	rowTabulates(X, cols, rows)	1.823317	1.918680	2.068009	1.945528	1.992203	8.626591
3	rowTabulates(X[cols, rows])	1.831858	1.923616	1.991026	1.959248	2.003563	2.853159

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	1.000000	1.000000	1.0000000	1.000000	1.000000	1.0000000
2	rowTabulates(X, cols, rows)	1.042768	1.041299	1.0366861	1.030631	1.037490	0.9973247
3	rowTabulates(X[cols, rows])	1.047653	1.043978	0.9980948	1.037899	1.043407	0.3298552

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 100x1000 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 100x1000 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.567425	1.639608	1.705160	1.655323	1.771980	2.380615
2	rowTabulates_X_S	1.748535	1.842582	1.994827	1.887706	1.920214	8.649732

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	rowTabulates_X_S	1.115546	1.123795	1.169876	1.140385	1.083654	3.633402

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 100x1000 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

1000x100 matrix

> X <- data[["1000x100"]]
> rows <- sample.int(nrow(X), size = nrow(X) * 0.7)
> cols <- sample.int(ncol(X), size = ncol(X) * 0.7)
> X_S <- X[rows, cols]
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204859 171.2    5709258 305.0  5709258 305.0
Vcells 6196447  47.3   22345847 170.5 56666022 432.4
> colStats <- microbenchmark(colTabulates_X_S = colTabulates(X_S, na.rm = FALSE), `colTabulates(X, rows, cols)` = colTabulates(X, 
+     rows = rows, cols = cols, na.rm = FALSE), `colTabulates(X[rows, cols])` = colTabulates(X[rows, 
+     cols], na.rm = FALSE), unit = "ms")
> X <- t(X)
> X_S <- t(X_S)
> gc()
          used  (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 3204850 171.2    5709258 305.0  5709258 305.0
Vcells 6246525  47.7   22345847 170.5 56666022 432.4
> rowStats <- microbenchmark(rowTabulates_X_S = rowTabulates(X_S, na.rm = FALSE), `rowTabulates(X, cols, rows)` = rowTabulates(X, 
+     rows = cols, cols = rows, na.rm = FALSE), `rowTabulates(X[cols, rows])` = rowTabulates(X[cols, 
+     rows], na.rm = FALSE), unit = "ms")

Table: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 1000x100 data. The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.418797	1.470277	1.498454	1.489486	1.504072	1.788356
2	colTabulates(X, rows, cols)	1.485973	1.556599	1.660665	1.563964	1.601022	8.140991
3	colTabulates(X[rows, cols])	1.484032	1.556742	1.657618	1.565163	1.593916	8.321581

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	colTabulates(X, rows, cols)	1.047347	1.058712	1.108252	1.050003	1.064458	4.552221
3	colTabulates(X[rows, cols])	1.045979	1.058808	1.106219	1.050808	1.059734	4.653202

Table: Benchmarking of rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on 1000x100 data (transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	1.819188	1.922722	2.013304	1.977278	2.113175	3.094882
2	rowTabulates(X, cols, rows)	1.904025	2.013527	2.153842	2.059273	2.173602	8.314753
3	rowTabulates(X[cols, rows])	1.907501	2.010804	2.187695	2.080598	2.263897	8.807484

	expr	min	lq	mean	median	uq	max
1	rowTabulates_X_S	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
2	rowTabulates(X, cols, rows)	1.046634	1.047228	1.069805	1.041468	1.028595	2.686614
3	rowTabulates(X[cols, rows])	1.048545	1.045811	1.086620	1.052254	1.071325	2.845822

Figure: Benchmarking of colTabulates_X_S(), colTabulates(X, rows, cols)() and colTabulates(X[rows, cols])() on 1000x100 data as well as rowTabulates_X_S(), rowTabulates(X, cols, rows)() and rowTabulates(X[cols, rows])() on the same data transposed. Outliers are displayed as crosses. Times are in milliseconds.

Table: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 1000x100 data (original and transposed). The top panel shows times in milliseconds and the bottom panel shows relative times.

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.418797	1.470277	1.498454	1.489486	1.504072	1.788356
2	rowTabulates_X_S	1.819188	1.922722	2.013304	1.977278	2.113175	3.094882

	expr	min	lq	mean	median	uq	max
1	colTabulates_X_S	1.000000	1.000000	1.000000	1.00000	1.000000	1.000000
2	rowTabulates_X_S	1.282205	1.307728	1.343587	1.32749	1.404969	1.730574

Figure: Benchmarking of colTabulates_X_S() and rowTabulates_X_S() on 1000x100 data (original and transposed). Outliers are displayed as crosses. Times are in milliseconds.

Appendix

Session information

R version 3.6.1 Patched (2019-08-27 r77078)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-3-6-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-6    matrixStats_0.55.0-9000 ggplot2_3.2.1          
[4] knitr_1.24              R.devices_2.16.0        R.utils_2.9.0          
[7] R.oo_1.22.0             R.methodsS3_1.7.1       history_0.0.0-9002     

loaded via a namespace (and not attached):
 [1] Biobase_2.45.0       bit64_0.9-7          splines_3.6.1       
 [4] network_1.15         assertthat_0.2.1     highr_0.8           
 [7] stats4_3.6.1         blob_1.2.0           robustbase_0.93-5   
[10] pillar_1.4.2         RSQLite_2.1.2        backports_1.1.4     
[13] lattice_0.20-38      glue_1.3.1           digest_0.6.20       
[16] colorspace_1.4-1     sandwich_2.5-1       Matrix_1.2-17       
[19] XML_3.98-1.20        lpSolve_5.6.13.3     pkgconfig_2.0.2     
[22] genefilter_1.66.0    purrr_0.3.2          ergm_3.10.4         
[25] xtable_1.8-4         mvtnorm_1.0-11       scales_1.0.0        
[28] tibble_2.1.3         annotate_1.62.0      IRanges_2.18.2      
[31] TH.data_1.0-10       withr_2.1.2          BiocGenerics_0.30.0 
[34] lazyeval_0.2.2       mime_0.7             survival_2.44-1.1   
[37] magrittr_1.5         crayon_1.3.4         statnet.common_4.3.0
[40] memoise_1.1.0        laeken_0.5.0         R.cache_0.13.0      
[43] MASS_7.3-51.4        R.rsp_0.43.1         tools_3.6.1         
[46] multcomp_1.4-10      S4Vectors_0.22.1     trust_0.1-7         
[49] munsell_0.5.0        AnnotationDbi_1.46.1 compiler_3.6.1      
[52] rlang_0.4.0          grid_3.6.1           RCurl_1.95-4.12     
[55] cwhmisc_6.6          rappdirs_0.3.1       labeling_0.3        
[58] bitops_1.0-6         base64enc_0.1-3      boot_1.3-23         
[61] gtable_0.3.0         codetools_0.2-16     DBI_1.0.0           
[64] markdown_1.1         R6_2.4.0             zoo_1.8-6           
[67] dplyr_0.8.3          bit_1.1-14           zeallot_0.1.0       
[70] parallel_3.6.1       Rcpp_1.0.2           vctrs_0.2.0         
[73] DEoptimR_1.0-8       tidyselect_0.2.5     xfun_0.9            
[76] coda_0.19-3

Total processing time was 15.56 secs.

Reproducibility

To reproduce this report, do:

html <- matrixStats:::benchmark('colRowTabulates_subset')

colRowTabulates_subset - HenrikBengtsson/matrixStats GitHub Wiki

colTabulates() and rowTabulates() benchmarks on subsetted computation

Data

Results

10x10 matrix

100x100 matrix

1000x10 matrix

10x1000 matrix

100x1000 matrix

1000x100 matrix

Appendix

Session information

Reproducibility

⚠️ GitHub.com Fallback ⚠️

colRowTabulates_subset - HenrikBengtsson/matrixStats GitHub Wiki

colTabulates() and rowTabulates() benchmarks on subsetted computation

Data

Results

10x10 matrix

100x100 matrix

1000x10 matrix

10x1000 matrix

100x1000 matrix

1000x100 matrix

Appendix

Session information

Reproducibility

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️