Known length slice initialization speed - Golang Benchmark Wednesday

show table of contents

I stumbled over the hint, that it is better for performance if you initialize your slices with a dedicated length and capacity if you know it. Sounds as it would make sense, but I wouldn’t be me if I just accept that without testing that hypothesis.

An example that I am using in real life is for creating a slice of ids for querying a database later on with that ids. Iterating over the original data structure (in my case a map[string]SimonsStruct{Id int, MORE FIELDS}) and copying the ids out.

Normally I used make([]int,0) (len == 0 & cap == 0), so let’s see if that would be faster with initializing the slice directly with it the right capacity and length.

Keep in mind the tests only work if you know the size of the target slice upfront. If not, sadly this Benchmark Wednesday will not help you.

Benchmark Code

Bad: Initialize slice empty even if you know the target size

const size int // See size values benchmarked later in table
func BenchmarkEmptyInit(b *testing.B) {
	for n := 0; n < b.N; n++ {
		data := make([]int,0)
		for k:=0;k<size;k++{
			data = append(data,k)
		}
	}
}

Best for big size: Initialize slice with known capacity and add data with append

const size int // See size values benchmarked later in table
func BenchmarkKnownAppend(b *testing.B) {
	for n := 0; n < b.N; n++ {
		data := make([]int,0,size)
		for k:=0;k<size;k++{
			data = append(data,k)
		}
	}
}

Best for small & medium size: Initialize slice with known capacity & length and add data with direct access

const size int // See size values benchmarked later in table
func BenchmarkKnownDirectAccess(b *testing.B) {
	for n := 0; n < b.N; n++ {
		data := make([]int,size,size)
		for k:=0;k<size;k++{
			data[k] = k
		}
	}
}

Results

The table shows the time it took for every example to init all its elements (only measured inside the benchmark loop) Have an eye on the unit! (ns/ms/s)

#Elements (size)	EmptyInit	KnownAppend	KnownDirectAccess
1	31.00 ns	1.52 ns	0.72 ns
100	852 ns	81.4 ns	59.1 ns
100 000	1.11 ms	0.22 ms	0.20 ms
1 000 000	10.76 ms	3.13 ms	3.14 ms
100 000 000	2.48 s	0.21 s	0.22 s
300 000 000	6.79 s	0.90 s	0.95 s

Interpretation

That initializing the slice with len & capacity 0 would be the worst was obvious here. As you can see the append and the direct access approach are very close to each other. After repeating the benchmarks on different machines I found that is more or less random which one is outperforming the other one. So both approaches can be viewed equally fast.

I prefer the append approach over the direct access one, as my code will benefit if the append gets optimized in future go versions.

Currently both approach have two memory accesses per entry:

Overwrite all allocated memory with nil value (in our int case with 0)
Write actual value

If the memory safety could be guaranteed it may be possible in the future to skip Step 1 for append approach and than it will outperform the direct access one.

Heap vs. Stack

Another interesting number from the benchmark: You can see the big performance hit when using the Heap instead of the Stack.

All the numbers are (more or less) growing linear with the size of the slice. But from 100 to 100 000 there is a big bump. It should be 1000x slower but it is >2500x slower. Here you can see the effect of the allocation on the heap instead of the stack. The big 100 000 element slice does not fit on the fast Stack anymore and must be allocated “by hand” (go does that for you) on the Heap.

Conclusion

The hint I found only was right: If you know the size of your target slice always initialize it with that size as capacity. For medium & small size slices use the direct access approach. For very big slices use append.

Bonus: How append slows down your code on relocation

One reason why the empty slice Init is so much slower has to do how append works internally. Append ensures that you always can add a new element to a slice. All is fine as long as there is enough capacity in the slice left. But if your slice is full, append needs to “grow” it.

The problem with growing is, that there is no such logic in the memory management. If you allocate a slice of capacity 10 you get that capacity and nothing more. So after your slice there might be more information in the memory which you are not allowed to accessed. So for growing a slice go takes the old capacity and doubles it and allocates a new chunk of memory for that. So now you have 20 as capacity for your slice and can add more data into it. This allocation is not super expensive. The problem is that we want to have the old values also in the new slice and need to copy it over which is bad for our performance.

Even more about the internals of slices can be found on: https://blog.golang.org/go-slices-usage-and-internals

Thanks for reading and see you next week!

You got any feedback? Would love to answer it on HackerNews