[t:/]$ 지식_

avx2 최적화

2017/03/24

simd, avx2를 직접적으로 이용해보려고 노력하는 것 보다 O3 옵션이 더 효율적이라는 사실을 일전에 알아보았습니다... 지못미 내시간 개삽질..

오늘은 MLP를 구현하면서 O3 옵션을 적용했습니다.

그러고나서 생성된 어셈코드를 살펴보니 SIMD만 씁니다. xmm 레지스터만 쓴다는 이야기죠.

O3 -mavx2 를 사용하고 어셈코드를 까보면 AVX2까지 사용합니다. v-계통 인스트럭션을 사용할 뿐만 아니라 ymm 레지스터까지 이용합니다.

오늘 짠 MLP의 연산속도를 측정해보았습니다.

O3 : 1분 46초
O3 -mavx2 : 1분 41초

미미하군요... 부하를 더 늘려보고 벡터라이즈된 구역를 좀 더 살펴봐야 할 것 같네요..

자. 이 글을 누가 읽을까요?

bruno : 인스타 같네요 riot : 특이점이 온다 andy : 여기 전단지가 있네 kalxin : 원래 그런거지

벡터라이즈를 직접 보면서 한 땀 한 땀 최적화 해보는 방법도 있습니다.

gcc -O3 -o mlp2 mlp2.c -lm -mavx2 -fopt-info-vec-missed

뭐 이런 메시지들이 나옵니다..

-fopt-info-vec 옵션을 달면 벡터라이즈된 영역만 나오는군요..

mlp2.c:214:2: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:239:20: note: not vectorized: not enough data-refs in basic block.
mlp2.c:219:5: note: not vectorized: not enough data-refs in basic block.
mlp2.c:244:46: note: not consecutive access _19 = *_18;

mlp2.c:244:46: note: not consecutive access train_label.16_16 = train_label;

mlp2.c:244:46: note: Failed to SLP the basic block.
mlp2.c:244:46: note: not vectorized: failed to find SLP opportunities in basic block.
mlp2.c:244:46: note: not consecutive access _19 = *_18;

mlp2.c:244:46: note: not consecutive access train_label.16_16 = train_label;

mlp2.c:244:46: note: Failed to SLP the basic block.




공유하기













[t:/] is not "technology - root". dawnsea, rss