Benchmarking - jboursiquot/go-for-experienced-programmers GitHub Wiki

Go's testing package has built-in support for benchmarking the performance of Go code. In this section, we'll see how to write some simple benchmarks and use Go's built-in toolchain to analyze the result to identify areas for improvement.

CPU benchmarks

Let's start with a simple example by looking at the common recursive fibonacci implementation below:

04-testing/benchmarking/example/fib.go

func fib(n int) int {
	if n < 2 {
		return n
	}
	return fib(n-1) + fib(n-2)
}

To benchmark this function, our code, living in something like fib_test.go, will look something like so:

04-testing/benchmarking/example/fib_test.go

func benchFib(i int, b *testing.B) {
	for n := 0; n < b.N; n++ {
		fib(i)
	}
}

func BenchmarkFib1(b *testing.B) { benchFib(1, b)}
func BenchmarkFib10(b *testing.B) { benchFib(10, b)}
func BenchmarkFib20(b *testing.B) { benchFib(20, b)}

To run our benchmarks, we invoke go test with the -bench flag like so:

$ go test -bench=. ./04-testing/benchmarking/example/...
goos: darwin
goarch: arm64
pkg: github.com/jboursiquot/go-for-experienced-programmers/04-testing/benchmarking/example
BenchmarkFib1-8         523323414                2.043 ns/op
BenchmarkFib10-8         6582650               181.6 ns/op
BenchmarkFib20-8           53408             22406 ns/op
PASS
ok      github.com/jboursiquot/go-for-experienced-programmers/04-testing/benchmarking/example       4.246s

Some observations:

  1. Unlike regular tests that start with Test, benchmarks start with Benchmark.
  2. The benchmark functions run the target code b.N times during which b.N is adjusted until the benchmark function lasts long enough to be timed reliably.
  3. The -bench flag accepts a regular expression to specify which benchmarks to run. When you use -bench=. (a period), it matches all benchmark functions in the package.

Interpreting the results:

goos: darwin
goarch: arm64
pkg: github.com/jboursiquot/go-for-experienced-programmers/04-testing/benchmarking/example
BenchmarkFib1-8         523323414                2.043 ns/op
...
  1. goos: darwin: The benchmarks are running on macOS.
  2. goarch: arm64: The benchmarks are running on an ARM64 architecture.
  3. pkg: github.com/jboursiquot/go-for-experienced-programmers/04-testing/benchmarking/example: The benchmarks are for the specified Go package.
  4. BenchmarkFib1-8 indicates the name of the benchmark function (BenchmarkFib1) and the number of CPU cores used (8 in this case).
  5. 523323414 indicates how many times the benchmark function was executed during the benchmark test. The benchmarking tool runs the function multiple times to get a reliable measurement.
  6. 2.043 ns/op is the average time taken for each iteration of the benchmark function, measured in nanoseconds per operation (ns/op).

Memory benchmarks

A variation on the benchmark above can includes memory allocation data to the results.

$ go test -bench=. -benchmem ./04-testing/benchmarking/example/...
goos: darwin
goarch: arm64
pkg: github.com/jboursiquot/go-for-experienced-programmers/04-testing/benchmarking/example
BenchmarkFib1-8         531585108                2.069 ns/op           0 B/op          0 allocs/op
BenchmarkFib10-8         6547386               182.6 ns/op             0 B/op          0 allocs/op
BenchmarkFib20-8           53047             22562 ns/op               0 B/op          0 allocs/op
PASS
ok      github.com/jboursiquot/go-for-experienced-programmers/04-testing/benchmarking/example       4.278s

The addition of a -benchmem flag on the go test -bench=. command will cause the output to contain two more columns:

  • allocs/op: The number of memory allocations per operation.
  • bytes/op: The number of bytes allocated per operation.

Using benchmarks for performance optimization

When considering performance optimization, it's important to focus on areas of your program where meaningful impact can be had. We'll use the benchmarking tools at our disposal to help us improve the performance of code that identifies palindromes (a word, phrase, or sequence that reads the same backward as forward, e.g., madam or nurses run).

Consider the initial implementation of the code:

./benchmarking/wordlens/unoptimized/wordlens.go

package unoptimized

type WordLens struct{}

func NewWordLens() WordLens {
	return WordLens{}
}

func (wl WordLens) FindPalindromes(words []string) map[string]int {
	res := make(map[string]int)
	for _, word := range words {
		if wl.isPalindrome(word) {
			res[word]++
		}
	}

	return res
}

func (wl WordLens) isPalindrome(word string) bool {
	runes := []rune(word)
	n := len(runes)
	for i := 0; i < n/2; i++ {
		if runes[i] != runes[n-1-i] {
			return false
		}
	}
	return true
}

We suspect the isPalindrome function implemtation might be suboptimal for a number of reasons:

  • Unnecessary conversion to []rune - Converting the string to a []rune is unnecessary for simple ASCII strings (which we expect), and it adds overhead may be significant.
  • Inefficient Loop - While the loop structure is correct, using []rune instead directly indexing the string increases complexity and memory usage.

How do we go from suspicion to evidence? Let's add a benchmark!

benchmarking/wordlens/unoptimized/wordlens_test.go

func BenchmarkFindPalindromes(b *testing.B) {
	b.StopTimer() // exclude preparations from the benchmark
	wl := unoptimized.NewWordLens()
	allWords := wordlens.TestWords() // returns a []string of 100 words, some of which are palindromes

	b.StartTimer() // run the benchmark
	for n := 25; n <= len(allWords); n = n + 25 {
		words := allWords[:n]
		b.Run(strconv.Itoa(n), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				wl.FindPalindromes(words)
			}
		})
	}
}

We'll use go test -bench to run the benchmark and capture the results in a file for future comparison.

$ go test -bench=BenchmarkFindPalindromes \
	-count=10 \
	-benchmem \
	-memprofile ./benchmarking/wordlens/benchmarks/unoptimized.mem.prof \
	-cpuprofile ./benchmarking/wordlens/benchmarks/unoptimized.cpu.prof \
	./benchmarking/wordlens/unoptimized | tee ./benchmarking/wordlens/benchmarks/unoptimized.bench.txt

Some exaplanations:

  • -bench=BenchmarkFindPalindromes restricts the benchmark specifically to the BenchmarkFindPalindromes function.
  • -count=10 is used with benchmarks to run them multiple times and get an average result (related -benchtime).
  • -benchmem is used when running benchmarks to include memory allocation statistics in the benchmark results.
  • memprofile is used to generate a memory profile. This profile can be analyzed to understand memory allocation and garbage collection behavior.
  • cpuprofile is used to generate a CPU profile. This profile can help you understand where your program spends its time, which functions are consuming the most CPU resources, and identify performance bottlenecks.

What gets generated are 3 files:

  1. benchmarking/wordlens/benchmarks/unoptimized.mem.prof
  2. benchmarking/wordlens/benchmarks/unoptimized.cpu.prof
  3. benchmarking/wordlens/benchmarks/unoptimized.bench.txt

benchmarking/wordlens/benchmarks/unoptimized.bench.txt

goos: darwin
goarch: arm64
pkg: github.com/jboursiquot/go-for-experienced-programmers/benchmarking/wordlens/unoptimized
BenchmarkFindPalindromes/25-8            2876311               394.4 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            2979783               392.9 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            2857099               394.2 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            3083442               390.8 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            3072172               393.5 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            3025881               526.6 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            3071755               450.6 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            2998027               399.3 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            3038516               398.9 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            2972845               394.1 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/50-8             840024              1324 ns/op            1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             861870              1332 ns/op            1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             859623              1320 ns/op            1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             848818              1323 ns/op            1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             819519              1326 ns/op            1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             889968              1318 ns/op            1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             804273              1314 ns/op            1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             827196              1314 ns/op            1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             853264              1355 ns/op            1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8             848450              1362 ns/op            1371 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             661256              1798 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             658161              1731 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             649840              1728 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             653936              1741 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             675405              1726 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             672657              1747 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             649951              1824 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             644968              1744 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             657780              1715 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8             667927              1836 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/100-8            422007              2614 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            400995              2823 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            439744              2745 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            418948              2662 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            435814              2623 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            437161              3022 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            420610              2821 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            399866              2789 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            404553              2688 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            440857              2710 ns/op            3314 B/op          4 allocs/op
PASS
ok      github.com/jboursiquot/go-for-experienced-programmers/benchmarking/wordlens/unoptimized  52.903s

Observations:

  1. 10 benchmarks are run for 25, 50, 75, and 100 words at a time to get a sense of how the FindPalindomes mehtod call will bahave as the number of words to search through increases.
  2. As expected, there are significant drops in the number of operations performed and an increase in the time it takes as more words are searched.
  3. The memory allocation doesn't vary wildly, indicating that our task should likely be one of CPU optimization.

Each benchmark should be run at least 10 times to gather a statistically significant sample of results. See benchstat.

Identify bottlenecks with profiles

So where's the bottleneck exactly? That's where our profiles come in:

  1. benchmarking/wordlens/benchmarks/unoptimized.mem.prof
  2. benchmarking/wordlens/benchmarks/unoptimized.cpu.prof

Since we believe we can have the most impact by improving CPU utilization over memory, we'll introspect the CPU profile.

Using go tool pprof, we'll launch the pprof tool in interactive CLI mode where we can issue commands to navigate the profile we captured.

$ go tool pprof ./benchmarking/wordlens/benchmarks/unoptimized.cpu.prof
File: unoptimized.test
Type: cpu
Time: Jun 13, 2024 at 9:26pm (EDT)
Duration: 52.73s, Total samples = 58.36s (110.67%)
Entering interactive mode (type "help" for commands, "o" for options)

(pprof) top

Showing nodes accounting for 50.55s, 86.62% of 58.36s total
Dropped 196 nodes (cum <= 0.29s)
Showing top 10 nodes out of 74
      flat  flat%   sum%        cum   cum%
    15.64s 26.80% 26.80%     15.64s 26.80%  runtime.kevent
    11.62s 19.91% 46.71%     12.04s 20.63%  runtime.stringtoslicerune
     9.77s 16.74% 63.45%      9.77s 16.74%  runtime.pthread_cond_wait
     2.58s  4.42% 67.87%      6.78s 11.62%  runtime.mapassign_faststr
     2.56s  4.39% 72.26%      2.56s  4.39%  runtime.usleep
     1.97s  3.38% 75.63%      1.97s  3.38%  runtime.madvise
     1.79s  3.07% 78.70%      1.79s  3.07%  runtime.pthread_cond_timedwait_relative_np
     1.59s  2.72% 81.43%      1.59s  2.72%  runtime.pthread_kill
     1.52s  2.60% 84.03%     13.56s 23.24%  github.com/jboursiquot/go-for-experienced-programmers/benchmarking/wordlens/unoptimized.WordLens.isPalindrome
     1.51s  2.59% 86.62%      1.51s  2.59%  runtime.pthread_cond_signal

(pprof) exit
$ 

The top command shows the top 10 hotspots befault. Some explanations regarding flat flat% sum% cum cum%:

  1. flat: This column shows the total amount of time spent in a particular function itself, excluding the time spent in functions called by it.
  2. flat%: This column shows the percentage of the total profiling time spent in a particular function itself.
  3. sum%: This column shows the cumulative percentage of the total profiling time up to and including the current line in the sorted output.
  4. cum: This column shows the total amount of time spent in a particular function and all the functions it calls.
  5. cum%: This column shows the percentage of the total profiling time spent in a particular function and all the functions it calls.

We see that over 23% cumulative time is spent in our isPalindrome function. We decide to optimize that function by replacing it with the new and improved version below:

func (wl WordLens) isPalindrome(word string) bool {
	for i := range word {
		if word[i] != word[len(word)-1-i] {
			return false
		}
	}
	return true
}

We'll run the benchmark again, making sure to capture the results in a new set of files for comparison:

$ go test -bench=BenchmarkFindPalindromes \
	-count=10 \
	-benchmem \
	-memprofile ./benchmarking/wordlens/benchmarks/optimized.mem.prof \
	-cpuprofile ./benchmarking/wordlens/benchmarks/optimized.cpu.prof \
	./benchmarking/wordlens/optimized | tee ./benchmarking/wordlens/benchmarks/optimized.bench.txt

What gets generated are 3 files:

  1. benchmarking/wordlens/benchmarks/optimized.mem.prof
  2. benchmarking/wordlens/benchmarks/optimized.cpu.prof
  3. benchmarking/wordlens/benchmarks/optimized.bench.txt

benchmarking/wordlens/benchmarks/optimized.bench.txt

oos: darwin
goarch: arm64
pkg: github.com/jboursiquot/go-for-experienced-programmers/benchmarking/wordlens/optimized
BenchmarkFindPalindromes/25-8            6887496               154.5 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            7709070               153.8 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            7707412               161.3 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            7723220               154.3 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            6657597               155.0 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            7688935               154.4 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            7792106               154.3 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            7812643               199.4 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            7627429               158.0 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/25-8            7582220               157.0 ns/op             0 B/op          0 allocs/op
BenchmarkFindPalindromes/50-8            1000000              1280 ns/op            1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1371324               876.6 ns/op          1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1354190               888.1 ns/op          1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1393686               884.5 ns/op          1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1310724               845.2 ns/op          1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1373640              1008 ns/op            1371 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1324628               851.8 ns/op          1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1391211               828.4 ns/op          1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1429286               858.3 ns/op          1370 B/op          2 allocs/op
BenchmarkFindPalindromes/50-8            1347831               829.0 ns/op          1370 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1071 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1042 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1051 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1040 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1044 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1044 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1043 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1052 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1043 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/75-8            1000000              1042 ns/op            1414 B/op          2 allocs/op
BenchmarkFindPalindromes/100-8            618550              1719 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            644259              1749 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            640963              1738 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            636975              1759 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            627584              1722 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            641410              1724 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            641762              1719 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            639453              1724 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            660934              1721 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            651400              1732 ns/op            3314 B/op          4 allocs/op
PASS
ok      github.com/jboursiquot/go-for-experienced-programmers/benchmarking/wordlens/optimized    56.895s

At first glance, it looks like we had some improvements:

Exerpt from unoptimized benchmark:

BenchmarkFindPalindromes/100-8            422007              2614 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            400995              2823 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            439744              2745 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            418948              2662 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            435814              2623 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            437161              3022 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            420610              2821 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            399866              2789 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            404553              2688 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            440857              2710 ns/op            3314 B/op          4 allocs/op

Excerpt from optimized benchmark:

BenchmarkFindPalindromes/100-8            618550              1719 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            644259              1749 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            640963              1738 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            636975              1759 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            627584              1722 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            641410              1724 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            641762              1719 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            639453              1724 ns/op            3313 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            660934              1721 ns/op            3314 B/op          4 allocs/op
BenchmarkFindPalindromes/100-8            651400              1732 ns/op            3314 B/op          4 allocs/op

We can get a more complete comparison using benchstat, a tool that computes statistical summaries and A/B comparisons of Go benchmarks.

Let's install it:

$ go install golang.org/x/perf/cmd/benchstat@latest

We can now compare the benchmarks:

$ benchstat -filter ".unit:(sec/op)" \
		unoptimized=./benchmarking/wordlens/benchmarks/unoptimized.bench.txt \
		optimized=./benchmarking/wordlens/benchmarks/optimized.bench.txt

Memory allocation details filtered out to keep the focus on CPU variance details.

goos: darwin
goarch: arm64
pkg: github.com/jboursiquot/go-for-experienced-programmers/benchmarking/wordlens/optimized
                      │  optimized   │
                      │    sec/op    │
FindPalindromes/25-8    154.8n ±  4%
FindPalindromes/50-8    867.5n ± 16%
FindPalindromes/75-8    1.044µ ±  1%
FindPalindromes/100-8   1.724µ ±  1%
geomean                 701.0n

pkg: github.com/jboursiquot/go-for-experienced-programmers/benchmarking/wordlens/unoptimized
                      │ unoptimized  │
                      │    sec/op    │
FindPalindromes/25-8    394.3n ± 14%
FindPalindromes/50-8    1.324µ ±  2%
FindPalindromes/75-8    1.743µ ±  5%
FindPalindromes/100-8   2.728µ ±  4%
geomean                 1.255µ

When there are benchmarks with different configuration (for example, from different packages as in this case), benchstat prints separate tables for each configuration.

The term “geomean” in the context of benchmark results refers to the geometric mean. The geometric mean is a type of average that is useful for sets of positive numbers that are interpreted according to their product rather than their sum (as with the arithmetic mean). It is particularly appropriate for measuring rates of growth or, in this case, summarizing performance metrics that span several orders of magnitude.

Additional exploration:

  • Use go tool pprof to see how the optimized version of the program fairs now with regards to CPU utilization.
  • Use go tool pprof -http=8080 to launch a browser-based explorer of the profile and poke around.

Exercise 1

Now that you've seen what a typical performance optimization workflow might look like, it's time to apply what you've learned. In this exercise, you are tasked with investigating if you can further optimize the WordLens type's FindPalindromes method. Given that determining if one word is a palindrome does not have any dependencies on any other word, this creates an "embarassingly parallel" type of opportunity for improvement. Or does it?

Objectives

  1. Before you introduce concurrency, run the benchmarks for the optimized version of the program first in order to generate the 04-testing/benchmarking/wordlens/benchmarks/optimized.* files that you'll need for go tool pprof and for benchstat later on.
  2. Refactor the FindPalindromes method to perform its work concurrently using the sync package's WaitGroup and Mutex types.
  3. Run the benchmarks on the new version of the code, making sure to not overwrite the initial profiles.
  4. Compare the results using the techniques above. Was there any improvement to the program when the work was done concurrently? Show the benchmarks to prove it.

Sample solution located e1/*. Makefile has useful targets you can emulate.

Discussion

  • Were the benchmark results surprising to you?
  • Do you think using a different concurrency technique would be better?

Exercise 2

In the previous exerice, you were asked to use the sync package's WaitGroup and Mutex types to implement concurrent palindrome search. In this exercise, your task is to investigate the effect of additional concurrency techniques, mainly:

  • using channels for communication (no sync.WaitGroup or sync.Mutex)
  • using a worker pool

Objectives

  1. Use channels synchronization and capture benchmark results.
  2. Use worker pool and capture benchmark results.
  3. Use benchstat to compare the benchmarks for the optimized sequential version, the version from exervice 1 (uses sync.WaitGroup and sync.Mutex), and the two versions you'll create in this exercise.
  4. You will be able to produce better formatted results with benchstat if all the implementations are benchmarked from the same package. See solution (e2/*.go) for how the instructor achieved this for educational purposes.

Sample Output

Ultimately, your results should provide insights similar to the following:

$ benchstat \
	seq=./e2/benchmarks/sequential.bench.txt \
	mutex=./e2/benchmarks/mutex.bench.txt \
	channel=./e2/benchmarks/channel.bench.txt \
	workers=./e2/benchmarks/workers.bench.txt
goos: darwin
goarch: arm64
pkg: github.com/jboursiquot/go-for-experienced-programmers/e2
                      │     seq     │                  mutex                  │                 channel                 │                 workers                 │
                      │   sec/op    │    sec/op      vs base                  │    sec/op      vs base                  │    sec/op      vs base                  │
FindPalindromes/25-8    239.8n ± 1%    5772.0n ± 5%  +2307.51% (p=0.000 n=10)    7323.0n ± 5%  +2954.43% (p=0.000 n=10)    9358.0n ± 1%  +3803.23% (p=0.000 n=10)
FindPalindromes/50-8    935.2n ± 1%   10926.0n ± 2%  +1068.24% (p=0.000 n=10)   12366.5n ± 3%  +1222.27% (p=0.000 n=10)   17392.5n ± 5%  +1759.66% (p=0.000 n=10)
FindPalindromes/75-8    1.142µ ± 1%    15.773µ ± 4%  +1281.13% (p=0.000 n=10)    16.825µ ± 1%  +1373.29% (p=0.000 n=10)    24.055µ ± 4%  +2006.39% (p=0.000 n=10)
FindPalindromes/100-8   1.822µ ± 1%    19.950µ ± 6%   +994.95% (p=0.000 n=10)    22.024µ ± 1%  +1108.78% (p=0.000 n=10)    32.071µ ± 6%  +1660.21% (p=0.000 n=10)
geomean                 826.5n          11.87µ       +1336.09%                    13.53µ       +1537.65%                    18.82µ       +2177.67%

                      │     seq      │                 mutex                 │                channel                 │                workers                │
                      │     B/op     │     B/op      vs base                 │     B/op       vs base                 │     B/op      vs base                 │
FindPalindromes/25-8      272.0 ± 0%    1872.0 ± 0%  +588.24% (p=0.000 n=10)     2408.0 ± 0%  +785.29% (p=0.000 n=10)    1170.0 ± 0%  +330.15% (p=0.000 n=10)
FindPalindromes/50-8    1.604Ki ± 0%   4.729Ki ± 0%  +194.95% (p=0.000 n=10)    5.722Ki ± 0%  +256.82% (p=0.000 n=10)   2.952Ki ± 0%   +84.10% (p=0.000 n=10)
FindPalindromes/75-8    1.646Ki ± 0%   6.334Ki ± 0%  +284.73% (p=0.000 n=10)    7.701Ki ± 0%  +367.73% (p=0.000 n=10)   3.370Ki ± 0%  +104.69% (p=0.000 n=10)
FindPalindromes/100-8   3.501Ki ± 0%   9.752Ki ± 0%  +178.51% (p=0.000 n=10)   11.619Ki ± 0%  +231.84% (p=0.000 n=10)   5.726Ki ± 0%   +63.52% (p=0.000 n=10)
geomean                 1.252Ki        4.807Ki       +284.03%                   5.891Ki       +370.56%                  2.840Ki       +126.90%

                      │    seq     │                 mutex                  │                channel                 │               workers                │
                      │ allocs/op  │  allocs/op    vs base                  │  allocs/op    vs base                  │  allocs/op   vs base                 │
FindPalindromes/25-8    3.000 ± 0%    53.000 ± 0%  +1666.67% (p=0.000 n=10)    56.000 ± 0%  +1766.67% (p=0.000 n=10)   14.000 ± 0%  +366.67% (p=0.000 n=10)
FindPalindromes/50-8    5.000 ± 0%   105.000 ± 0%  +2000.00% (p=0.000 n=10)   108.000 ± 0%  +2060.00% (p=0.000 n=10)   16.000 ± 0%  +220.00% (p=0.000 n=10)
FindPalindromes/75-8    5.000 ± 0%   155.000 ± 0%  +3000.00% (p=0.000 n=10)   158.000 ± 0%  +3060.00% (p=0.000 n=10)   16.000 ± 0%  +220.00% (p=0.000 n=10)
FindPalindromes/100-8   7.000 ± 0%   207.000 ± 0%  +2857.14% (p=0.000 n=10)   210.000 ± 0%  +2900.00% (p=0.000 n=10)   18.000 ± 0%  +157.14% (p=0.000 n=10)
geomean                 4.787          115.6       +2314.92%                    119.0       +2386.46%                   15.94       +232.94%