The sha_ni sha256 instructions have been shown to provide an ~4x increase in hash rate on newer amd64 systems versus the avx2 implementation. Transliterating the Linux implementation shows an up to 3.79x increase in hash rate under benchmarks.
Q: Linux is GPL, is this OK to bring into golang?
A: It appears so. The Linux implementation file is dual BSD-3 clause/gpl. A look through the history of that file shows that all relevant changes have come from Intel employees, Intel has signed the CLA, and in fact the current AVX2 implementation submitted by Russ Cox is called out as arriving from the same source.
Q: Was the Assembly Policy followed?
A: I believe so, even to the point of leaving a measurable but ~1% perf increase on the table in exchange for simplifying constant table access and sharing that with the existing avx2 implementation.