I stumbled over the hint, that it is better for performance if you initialize your slices with a dedicated length and capacity if you know it. Sounds as it would make sense, but I wouldn’t be me if I just accept that without testing that hypothesis.
An example that I am using in real life is for creating a slice of ids for querying a database later on with that ids. Iterating over the original data structure (in my case a map[string]SimonsStruct{Id int, MORE FIELDS}
) and copying the ids out.
Normally I used make([]int,0)
(len == 0 & cap == 0), so let’s see if that would be faster with initializing the slice directly with it the right capacity and length.
Keep in mind the tests only work if you know the size of the target slice upfront. If not, sadly this Benchmark Wednesday will not help you.
Benchmark Code
Bad: Initialize slice empty even if you know the target size
const size int // See size values benchmarked later in table
func BenchmarkEmptyInit(b *testing.B) {
for n := 0; n < b.N; n++ {
data := make([]int,0)
for k:=0;k<size;k++{
data = append(data,k)
}
}
}
Best for big size: Initialize slice with known capacity and add data with append
const size int // See size values benchmarked later in table
func BenchmarkKnownAppend(b *testing.B) {
for n := 0; n < b.N; n++ {
data := make([]int,0,size)
for k:=0;k<size;k++{
data = append(data,k)
}
}
}
Best for small & medium size: Initialize slice with known capacity & length and add data with direct access
const size int // See size values benchmarked later in table
func BenchmarkKnownDirectAccess(b *testing.B) {
for n := 0; n < b.N; n++ {
data := make([]int,size,size)
for k:=0;k<size;k++{
data[k] = k
}
}
}
Results
The table shows the time it took for every example to init all its elements (only measured inside the benchmark loop) Have an eye on the unit! (ns/ms/s)
#Elements (size) | EmptyInit | KnownAppend | KnownDirectAccess |
---|---|---|---|
1 | 31.00 ns | 1.52 ns | 0.72 ns |
100 | 852 ns | 81.4 ns | 59.1 ns |
100 000 | 1.11 ms | 0.22 ms | 0.20 ms |
1 000 000 | 10.76 ms | 3.13 ms | 3.14 ms |
100 000 000 | 2.48 s | 0.21 s | 0.22 s |
300 000 000 | 6.79 s | 0.90 s | 0.95 s |
Interpretation
That initializing the slice with len & capacity 0 would be the worst was obvious here. As you can see the append and the direct access approach are very close to each other. After repeating the benchmarks on different machines I found that is more or less random which one is outperforming the other one. So both approaches can be viewed equally fast.
I prefer the append approach over the direct access one, as my code will benefit if the append gets optimized in future go versions.
Currently both approach have two memory accesses per entry:
- Overwrite all allocated memory with
nil
value (in our int case with0
) - Write actual value
If the memory safety could be guaranteed it may be possible in the future to skip Step 1 for append approach and than it will outperform the direct access one.
Heap vs. Stack
Another interesting number from the benchmark: You can see the big performance hit when using the Heap instead of the Stack.
All the numbers are (more or less) growing linear with the size of the slice. But from 100
to 100 000
there is a big bump. It should be 1000x
slower but it is >2500x
slower. Here you can see the effect of the allocation on the heap instead of the stack. The big 100 000
element slice does not fit on the fast Stack anymore and must be allocated “by hand” (go does that for you) on the Heap.
Conclusion
The hint I found only was right: If you know the size of your target slice always initialize it with that size as capacity. For medium & small size slices use the direct access approach. For very big slices use append.
Bonus: How append slows down your code on relocation
One reason why the empty slice Init is so much slower has to do how append works internally. Append ensures that you always can add a new element to a slice. All is fine as long as there is enough capacity in the slice left. But if your slice is full, append needs to “grow” it.
The problem with growing is, that there is no such logic in the memory management. If you allocate a slice of capacity 10
you get that capacity and nothing more. So after your slice there might be more information in the memory which you are not allowed to accessed. So for growing a slice go takes the old capacity and doubles it and allocates a new chunk of memory for that. So now you have 20
as capacity for your slice and can add more data into it. This allocation is not super expensive. The problem is that we want to have the old values also in the new slice and need to copy it over which is bad for our performance.
Even more about the internals of slices can be found on: https://blog.golang.org/go-slices-usage-and-internals
Thanks for reading and see you next week!
You got any feedback? Would love to answer it on HackerNews