simd, avx2를 직접적으로 이용해보려고 노력하는 것 보다 O3 옵션이 더 효율적이라는 사실을 일전에 알아보았습니다... 지못미 내시간 개삽질..
오늘은 MLP를 구현하면서 O3 옵션을 적용했습니다.
그러고나서 생성된 어셈코드를 살펴보니 SIMD만 씁니다. xmm 레지스터만 쓴다는 이야기죠.
O3 -mavx2 를 사용하고 어셈코드를 까보면 AVX2까지 사용합니다. v-계통 인스트럭션을 사용할 뿐만 아니라 ymm 레지스터까지 이용합니다.
오늘 짠 MLP의 연산속도를 측정해보았습니다.
O3 : 1분 46초
O3 -mavx2 : 1분 41초
미미하군요... 부하를 더 늘려보고 벡터라이즈된 구역를 좀 더 살펴봐야 할 것 같네요..
자. 이 글을 누가 읽을까요?
bruno : 인스타 같네요 riot : 특이점이 온다 andy : 여기 전단지가 있네 kalxin : 원래 그런거지
벡터라이즈를 직접 보면서 한 땀 한 땀 최적화 해보는 방법도 있습니다.
gcc -O3 -o mlp2 mlp2.c -lm -mavx2 -fopt-info-vec-missed
뭐 이런 메시지들이 나옵니다..
-fopt-info-vec 옵션을 달면 벡터라이즈된 영역만 나오는군요..
mlp2.c:214:2: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:244:46: note: not consecutive access _19 = *_18;
mlp2.c:244:46: note: not consecutive access train_label.16_16 = train_label;
mlp2.c:244:46: note: Failed to SLP the basic block.
mlp2.c:244:46: note: not vectorized: failed to find SLP opportunities in basic block.
mlp2.c:244:46: note: not consecutive access _19 = *_18;
mlp2.c:244:46: note: not consecutive access train_label.16_16 = train_label;
mlp2.c:244:46: note: Failed to SLP the basic block.