from Hacker News

Go 1.8 toolchain improvements

by spacey on 11/19/16, 1:18 PM with 16 comments
by svetly0 on 11/19/16, 4:48 PM
MIPS32 support - kudos to Vladimir Stefanovic and Imagination Technologies for making this happen. Many people from the embedded world will also greatly appreciate support for soft-float MIPS32 hardware.
by 0xmohit on 11/19/16, 5:25 PM
There are scores of other optimizations [0] as well:
  Optimizations:
  
  bytes, strings: optimize for ASCII sets (CL 31593)
  bytes, strings: optimize multi-byte index operations on s390x (CL 32447)
  bytes,strings: use IndexByte more often in Index on AMD64 (CL 31690)
  bytes: Use the same algorithm as strings for Index (CL 22550)
  bytes: improve WriteRune performance (CL 28816)
  bytes: improve performance for bytes.Compare on ppc64x (CL 30949)
  bytes: make IndexRune faster (CL 28537)
  cmd/asm, go/build: invoke cmd/asm only once per package (CL 27636)
  cmd/compile, cmd/link: more efficient typelink generation (CL 31772)
  cmd/compile, cmd/link: stop generating unused go.string.hdr symbols. (CL 31030)
  cmd/compile,runtime: redo how map assignments work (CL 30815)
  cmd/compile/internal/obj/x86: eliminate some function prologues (CL 24814)
  cmd/compile/internal/ssa: generate bswap on AMD64 (CL 32222)
  cmd/compile: accept literals in samesafeexpr (CL 26666)
  cmd/compile: add more non-returning runtime calls (CL 28965)
  cmd/compile: add size hint to map literal allocations (CL 23558)
  cmd/compile: be more aggressive in tighten pass for booleans (CL 28390)
  cmd/compile: directly construct Fields instead of ODCLFIELD nodes (CL 31670)
  cmd/compile: don't reserve X15 for float sub/div any more (CL 28272)
  cmd/compile: don’t generate pointless gotos during inlining (CL 27461)
  cmd/compile: fold negation into comparison operators (CL 28232)
  cmd/compile: generate makeslice calls with int arguments (CL 27851)
  cmd/compile: handle e == T comparison more efficiently (CL 26660)
  cmd/compile: improve s390x SSA rules for logical ops (CL 31754)
  cmd/compile: improve s390x rules for folding ADDconst into loads/stores (CL 30616)
  cmd/compile: improve string iteration performance (CL 27853)
  cmd/compile: improve tighten pass (CL 28712)
  cmd/compile: inline _, ok = i.(T) (CL 26658)
  cmd/compile: inline atomics from runtime/internal/atomic on amd64 (CL 27641, CL 27813)
  cmd/compile: inline convT2{I,E} when result doesn't escape (CL 29373)
  cmd/compile: inline x, ok := y.(T) where T is a scalar (CL 26659)
  cmd/compile: intrinsify atomic operations on s390x (CL 31614)
  cmd/compile: intrinsify math/big.mulWW, divWW on AMD64 (CL 30542)
  cmd/compile: intrinsify runtime/internal/atomic.Xaddint64 (CL 29274)
  cmd/compile: intrinsify slicebytetostringtmp when not instrumenting (CL 29017)
  cmd/compile: intrinsify sync/atomic for amd64 (CL 28076)
  cmd/compile: make [0]T and [1]T SSAable types (CL 32416)
  cmd/compile: make link register allocatable in non-leaf functions (CL 30597)
  cmd/compile: missing float indexed loads/stores on amd64 (CL 28273)
  cmd/compile: move stringtoslicebytetmp to the backend (CL 32158)
  cmd/compile: only generate ·f symbols when necessary (CL 31031)
  cmd/compile: optimize bool to int conversion (CL 22711)
  cmd/compile: optimize integer "in range" expressions (CL 27652)
  cmd/compile: remove Zero and NilCheck for newobject (CL 27930)
  cmd/compile: remove duplicate nilchecks (CL 29952)
  cmd/compile: remove some write barriers for stack writes (CL 30290)
  cmd/compile: simplify div/mod on ARM (CL 29390)
  cmd/compile: statically initialize some interface values (CL 26668)
  cmd/compile: unroll comparisons to short constant strings (CL 26758)
  cmd/compile: use 2-result divide op (CL 25004)
  cmd/compile: use masks instead of branches for slicing (CL 32022)
  cmd/compile: when inlining ==, don’t take the address of the values (CL 22277)
  container/heap: remove one unnecessary comparison in Fix (CL 24273)
  crypto/elliptic: add s390x assembly implementation of NIST P-256 Curve (CL 31231)
  crypto/sha256: improve performance for sha256.block on ppc64le (CL 32318)
  crypto/sha512: improve performance for sha512.block on ppc64le (CL 32320)
  crypto/{aes,cipher}: add optimized implementation of AES-GCM for s390x (CL 30361)
  encoding/asn1: reduce allocations in Marshal (CL 27030)
  encoding/csv: avoid allocations when reading records (CL 24723)
  encoding/hex: change lookup table from string to array (CL 27254)
  encoding/json: Use a lookup table for safe characters (CL 24466)
  hash/crc32: improve the AMD64 implementation using SSE4.2 (CL 24471)
  hash/crc32: improve the AMD64 implementation using SSE4.2 (CL 27931)
  hash/crc32: improve the processing of the last bytes in the SSE4.2 code for AMD64 (CL 24470)
  image/color: improve speed of RGBA methods (CL 31773)
  image/draw: optimize drawFillOver as drawFillSrc for opaque fills (CL 28790)
  math/big: 10%-20% faster float->decimal conversion (CL 31250, CL 31275)
  math/big: avoid allocation in float.{Add, Sub} when there's no aliasing (CL 23568)
  math/big: make division faster (CL 30613)
  math/big: use array instead of slice for deBruijn lookups (CL 26663)
  math/big: uses SIMD for some math big functions on s390x (CL 32211)
  math: speed up Gamma(+Inf) (CL 31370)
  math: speed up bessel functions on AMD64 (CL 28086)
  math: use SIMD to accelerate some scalar math functions on s390x (CL 32352)
  reflect: avoid zeroing memory that will be overwritten (CL 28011)
  regexp: avoid alloc in QuoteMeta when not quoting (CL 31395)
  regexp: reduce mallocs in Regexp.Find* and Regexp.ReplaceAll* (CL 23030)
  runtime: cgo calls are about 100ns faster (CL 29656, CL 30080)
  runtime: defer is now 2X faster (CL 29656)
  runtime: implement getcallersp in Go (CL 29655)
  runtime: improve memmove for amd64 (CL 22515, CL 29590)
  runtime: increase malloc size classes (CL 24493)
  runtime: large objects no longer cause significant goroutine pauses (CL 23540)
  runtime: make append only clear uncopied memory (CL 30192)
  runtime: make assists perform root jobs (CL 32432)
  runtime: memclr perf improvements on ppc64x (CL 30373)
  runtime: minor string/rune optimizations (CL 27460)
  runtime: optimize defer code (CL 29656)
  runtime: remove a load and shift from scanobject (CL 22712)
  runtime: remove defer from standard cgo call (CL 30080)
  runtime: speed up StartTrace with lots of blocked goroutines (CL 25573)
  runtime: speed up non-ASCII rune decoding (CL 28490)
  strconv: make FormatFloat slowpath a little faster (CL 30099)
  strings: add special cases for Join of 2 and 3 strings (CL 25005)
  strings: make IndexRune faster (CL 28546)
  strings: use AVX2 for Index if available (CL 22551)
  strings: use Index in Count (CL 28586)
  syscall: avoid convT2I allocs for common Windows error values (CL 28484, CL 28990)
  text/template: improve lexer performance in finding left delimiters (CL 24863)
  unicode/utf8: optimize ValidRune (CL 32122)
  unicode/utf8: reduce bounds checks in EncodeRune (CL 28492)
[0] https://github.com/golang/go/blob/master/doc/go1.8.txt
by grabcocque on 11/19/16, 3:56 PM
So, the language has now spent getting on for 18 months significant slower than it used to be, and nobody seems to have a real issues with this?