一次avx2在gcc上core dump的排查经历

Jul 22, 2021 · 3 min read · simd avx

背景

起因是同事在实现int4的功能，结果流水线有一条死活过不了(gcc版本为4.8.5),一直core dump 经过初步排查，找出了如下最小可以复现的代码:

 1
 2#include <immintrin.h>
 3
 4class Test{
 5    public:
 6    Test(){
 7        tmp = _mm256_set_epi32(0,0,0,0,0,0,0,0);
 8    }
 9    private:
10    __m256i tmp;
11};
12int main(){
13    auto *tmp = new Test();
14    return 0;
15}

gcc版本为4.8.5 其中编译选项为

1g++ -std=c++11 -mavx2 a.cpp 
2

现象为会core在 tmp = _mm256_set_epi32(0,0,0,0,0,0,0,0);

但是同样的代码，同样的编译选项，在gcc7.3上就不会发生core的问题。

初步排查

查看汇编代码,gcc4.8.5生成的如下:

 1
 2main:
 3        push    rbp
 4        mov     rbp, rsp
 5        mov     edi, 32
 6        call    operator new(unsigned long)
 7        vpxor   xmm0, xmm0, xmm0
 8        vmovdqa YMMWORD PTR [rax], ymm0
 9        mov     eax, 0
10        pop     rbp
11        ret
12

链接在这里

然而在gcc7.3下，生成的汇编代码如下:

 1
 2main:
 3        push    rbp
 4        mov     rbp, rsp
 5        push    r10
 6        sub     rsp, 8
 7        mov     esi, 32
 8        mov     edi, 32
 9        call    operator new(unsigned long, std::align_val_t)
10        vpxor   xmm0, xmm0, xmm0
11        vmovdqa YMMWORD PTR [rax], ymm0
12        mov     eax, 0
13        add     rsp, 8
14        pop     r10
15        pop     rbp
16        ret
17

链接在这里

发现调用的new operator竟然不是同一个。-std=c++17下带了一个类型为 std::align_val_t的参数

同时观察到，如果不用new来创建Object, 也不会发生core dump

此时基本确定，问题和new有关。

new的对齐规则

然后在公司大佬的指引下，看到了-faligned-new

-faligned-new Enable support for C++17 new of types that require more alignment than void* ::operator new(std::size_t) provides. A numeric argument such as -faligned-new=32 can be used to specify how much alignment (in bytes) is provided by that function, but few users will need to override the default of alignof(std::max_align_t).

This flag is enabled by default for -std=c++17.

这个参数的作用其实是用来设置

1__STDCPP_DEFAULT_NEW_ALIGNMENT__

这个值默认为“alignof(std::max_align_t)”

可以用如下代码来验证:

 1#include <immintrin.h>
 2#include <iostream>
 3
 4class Test{
 5    public:
 6    Test(){
 7        tmp = _mm256_set_epi32(0,0,0,0,0,0,0,0);
 8    }
 9    private:
10    __m256i tmp;
11};
12int main(){
13    auto *tmp = new Test();
14 std::cout<<__STDCPP_DEFAULT_NEW_ALIGNMENT__;
15    return 0;
16}

编译选项为:

1g++ -std=c++17 -mavx2  -faligned-new=32 c.cpp

设置了有什么作用呢？

编译器会根据

1__STDCPP_DEFAULT_NEW_ALIGNMENT__

的值来判断是调用哪个版本的new. 具体来说，如果type的aligment大于这个值，就会调用带对齐参数版本的new:

1 operator new(unsigned long, std::align_val_t)

否则就调用不带对齐参数版本的new：

1 operator new(unsigned long)
2

按照如上的推断，在gcc7.3,c++17下，通过设置-faligned-new，使得编译器不去调用带对齐参数的new，那么也应该发生core才对。

。。然而实际上并没有使用如下编译参数，无事发生

1g++ -std=c++17 -mavx2  -faligned-new=32 c.cpp

Why??? 为什么没有core?

If you’re compiling in [c++17] mode only with a sufficiently recent compiler (e.g., GCC>=7, clang>=5, MSVC>=19.12), then everything is taken care by the compiler and you can stop reading.

因为gcc7以后的编译器已经把对齐之类的事情帮我们做了。。

按照这个想法，在gcc6下，总会core吧？

然后发现也没有。。。

继续排查发现，gcc 4.9.4仍然会core 但是gcc 5就没有问题了。

怀疑是gcc5做了什么修复，或者是gcc5对应的glibc做了什么修复。。

不过暂时没有找到。这里待补充。

解决办法

手动对齐一下就好了

 1
 2template <size_t ALIGNMENT>
 3struct alignas(ALIGNMENT) AlignedNew {
 4  static_assert(ALIGNMENT > 0, "ALIGNMENT must be positive");
 5  static_assert((ALIGNMENT & (ALIGNMENT - 1)) == 0,
 6      "ALIGNMENT must be a power of 2");
 7  static_assert((ALIGNMENT % sizeof(void*)) == 0,
 8      "ALIGNMENT must be a multiple of sizeof(void *)");
 9  static void* operator new(size_t count) { return Allocate(count); }
10  static void* operator new[](size_t count) { return Allocate(count); }
11  static void operator delete(void* ptr) { free(ptr); }
12  static void operator delete[](void* ptr) { free(ptr); }
13
14 private:
15  static void* Allocate(size_t count) {
16    void* result = nullptr;
17    const auto alloc_failed = posix_memalign(&result, ALIGNMENT, count);
18    if (alloc_failed)  throw ::std::bad_alloc();
19    return result;
20  }
21};
22class Test: public AlignedNew<32> {
23    public:
24    Test(){
25        tmp = _mm256_set_epi32(0,0,0,0,0,0,0,0);
26    }
27    private:
28	  __m256i tmp;
29};
30

一次avx2在gcc上core dump的排查经历

背景

初步排查

new的对齐规则

解决办法

参考链接