Go performance tuning document
TIL
489
White

huydx viết ngày 02/06/2017

Go performance tuning document

Repost from https://github.com/dgryski/go-perfbook/blob/master/TODO

blog posts

                - use integer map keys if possible
                - hard to compete with Go's map implementation; esp. if your data structure has lots of pointer chasing
                - aes-ni instructions make string hashing much faster
                - prefer structs to maps if you know the map keys (esp. coming from perl, etc)
                - channels are useful, but slow; raw atomics can help with performance
                - cgo has overhead
                - profile before optimizing
            - don't waste programmer cycles saving the wrong CPU cycles (or memory allocations)
            - bash$ time; time.Now()/time.Since(); pprof.StartCPUProfile/pprof.StopCPUProfile; go tool pprof http://.../profile
            - bash$ ps; runtime.ReadMemStats(); runtime.WriteHeapProfile(); go tool pprof http://.../heap
            - slice operations are sometimes O(n)
            - https://golang.org/pkg/runtime/debug/
            - sync.Pool (basically)

https://methane.github.io/2015/02/reduce-allocation-in-go-code

        - 1. correctness is important
        - 2. BenchmarkXXX with b.ReportAllocs() (or -benchmem when running)
        - 3. allocfreetrace=1 produces stack trace on every allocation
        - strategies:
            - avoid string concat; use []byte+append() (+strconv.AppendInt(), ...)
            - benchcmp
            - avoid time.Format
            - avoid range when iterating strings ([]rune conversion + utf8 decoding)
            - can append string to []byte
            - write two versions, one for string, one for []byte (avoids conversion+copy (sometimes...))
            - reuse existing buffers instead of creating new ones

http://bravenewgeek.com/so-you-wanna-go-fast/

            - performance fast vs. delivery fast; make the right decision
            - lock-free ring buffer vs. channels: faster except with GOMAXPROCS=1
            - defer has a cost (allocation+cpu)
                BenchmarkMutexDeferUnlock-8 20000000 96.6 ns/op
                BenchmarkMutexUnlock-8 100000000 19.5 ns/op
            - reflection+json
                - ffjson avoids reflection
                - msgp avoids json
                - interfaces have dynamic dispatch which can't be inlined
                - => use concrete types (+ code duplication)
            - heap vs. stack; escape analysis
            - lots of short-lived objects is expensive for the gc
            - sync.Pool reuses objects *between* gc runs
            - you need your own free list to hold onto things between gc runs
                (but now you're subverting the purpose of a garbage collector)
            - false sharing
            - custom lock-free data structures: fast but *hard*
            - "Speed comes at the cost of simplicity, at the cost of development time, and at the cost of continued maintenance. Choose wisely."

cgo:

cgo has overhead
    (which has only gotten more expensive over time) -- ~200 ns/call
ssa backend means less difference in codegen
really think if you want cgo: http://dave.cheney.net/2016/01/18/cgo-is-not-go

videos:

https://gophervids.appspot.com/#tags=optimization
"Profiling and Optimizng Go" (Uber)
https://www.youtube.com/watch?v=N3PWzBeLX2M
https://go-talks.appspot.com/github.com/davecheney/presentations/writing-high-performance-go.slide
https://www.youtube.com/watch?v=zWp0N9unJFc
Björn Rabenstein:  https://docs.google.com/presentation/d/1Zu0BdbhMRar7ycEwDi8jepGokTXTDXlKFf7C13tusuI/edit
 https://www.youtube.com/watch?v=ZuQcbqYK0BY
https://go-talks.appspot.com/github.com/mkevac/golangmoscow2016/gomeetup.slide
CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance with Data Structures"
https://www.youtube.com/watch?v=fHNmRkzxHWs
Performance Engineering of Software Systems
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2010/
https://talks.golang.org/2013/highperf.slide#1
Machine Architecture: Things Your Programming Language Never Told You
https://www.youtube.com/watch?v=L7zSU9HI-6I
7 Ways to Profile Go Applications
https://www.youtube.com/watch?v=2h_NFBFrciI
dotGo 2016 - Damian Gryski - Slices: Performance through cache-friendliness
https://www.youtube.com/watch?v=jEG4Qyo_4Bc

asm:

https://golang.org/doc/asm
https://goroutines.com/asm
http://www.doxsey.net/blog/go-and-assembly
https://www.youtube.com/watch?v=9jpnFmJr2PE
https://blog.gopheracademy.com/advent-2016/peachpy/
https://blog.sgmansfield.com/2017/04/a-foray-into-go-assembly-programming/
http://lemire.me/blog/2016/12/21/performance-overhead-when-calling-assembly-from-go/

posts:

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html
https://arxiv.org/abs/1509.05053
http://grokbase.com/t/gg/golang-nuts/155ea0t5hf/go-nuts-after-set-gomaxprocs-different-machines-have-different-bahaviors-some-speed-up-some-slow-down
http://grokbase.com/t/gg/golang-nuts/14138jw64s/go-nuts-concurrent-read-write-of-different-parts-of-a-slice

Escape Analysis Flaws
https://docs.google.com/document/d/1CxgUBPlx9iJzkz9JWkb6tIpTe5q32QDmz8l0BouG0Cw/preview

https://hackernoon.com/optimizing-optimizing-some-insights-that-led-to-a-400-speedup-of-powerdns-5e1a44b58f1c
http://leto.net/docs/C-optimization.php

tools:

https://godoc.org/github.com/aclements/go-perf
https://godoc.org/x/perf/cmd/benchstat
https://github.com/uber/go-torch
https://github.com/rakyll/gom
https://github.com/tam7t/sigprof
https://github.com/aybabtme/dpprof
https://github.com/wblakecaldwell/profiler
https://github.com/MiniProfiler/go
https://perf.wiki.kernel.org/index.php/Main_Page
https://github.com/dominikh/go-structlayout
http://www.brendangregg.com/perf.html
https://github.com/davecheney/gcvis
https://github.com/pavel-paulau/gcterm

trace:

https://making.pusher.com/go-tool-trace/
https://www.youtube.com/watch?v=mmqDlbWk_XA
https://www.youtube.com/watch?v=nsM_m4hZ-bA

papers:

https://www.akkadia.org/drepper/cpumemory.pdf
https://software.intel.com/sites/default/files/article/392271/aos-to-soa-optimizations-using-iterative-closest-point-mini-app.pdf

optimization guides:

http://developer.amd.com/resources/developer-guides-manuals/
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.uan0015b/index.html
https://www-ssl.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

stackoverflow:

https://stackoverflow.com/questions/19397699/why-struct-with-padding-fields-works-faster/19397791#19397791
https://stackoverflow.com/questions/10017026/no-speedup-in-multithread-program/10017482#10017482

practice:
https://twitter.com/dgryski/status/584682584942194689
huydx 02-06-2017

Bình luận


White
{{ comment.user.name }}
Bỏ hay Hay
{{comment.like_count}}
Male avatar
{{ comment_error }}
Hủy
   

Hiển thị thử

Chỉnh sửa

White

huydx

115 bài viết.
855 người follow
Kipalog
{{userFollowed ? 'Following' : 'Follow'}}
Cùng một tác giả
White
135 8
Introduction (Link) là một cuộc thi ở Nhật, và cũng chỉ có riêng ở Nhật. Đây là một cuộc thi khá đặc trưng bởi sự thú vị của cách thi của nó, những...
huydx viết hơn 1 năm trước
135 8
White
109 14
Happy programmer là gì nhỉ, chắc ai đọc xong title của bài post này cũng không hiểu ý mình định nói đến là gì :D. Đầu tiên với cá nhân mình thì hap...
huydx viết gần 3 năm trước
109 14
White
86 10
(Ảnh) Mở đầu Chắc nhiều bạn đã nghe đến khái niệm oauth. Về cơ bản thì oauth là một phương thức chứng thực, mà nhờ đó một web service hay một ap...
huydx viết hơn 2 năm trước
86 10
Bài viết liên quan
White
18 1
Toán tử XOR có tính chất: + A XOR A = 0 + 0 XOR A = A Với tính chất này, có thể cài đặt bài toán sau với độ phức tạp O(N) về runtime, và với O(1)...
kiennt viết hơn 1 năm trước
18 1
White
1 1
Chào mọi người, hôm nay mình viết một bài TIL nhỏ về cách lấy độ phân giải của màn hình hiện tại đang sử dụng. xdpyinfo | grep dimensions Kết quả...
namtx viết 7 tháng trước
1 1
White
8 0
Lấy fake path của file trong html input Ngữ cảnh: em cần làm một cái nút tải ảnh lên có preview. GIải pháp đầu: Dùng (Link) đọc file ảnh thành ba...
Hoàng Duy viết gần 2 năm trước
8 0
{{like_count}}

kipalog

{{ comment_count }}

bình luận

{{liked ? "Đã kipalog" : "Kipalog"}}


White
{{userFollowed ? 'Following' : 'Follow'}}
115 bài viết.
855 người follow

 Đầu mục bài viết

Vẫn còn nữa! x

Kipalog vẫn còn rất nhiều bài viết hay và chủ đề thú vị chờ bạn khám phá!