PowerVR Low Level GLSL Optimisation

PowerVR
Low Level GLSL Optimisation
이민웅
Shader Study
http://cdn.imgtec.com/sdk-documentation/PowerVR%20Low%20level%20GLSL%20Optimisation.pdf

Low Level GLSL Optimisation
• https://www.imgtec.com/blog/low-level-glsl-optimisation-for-
powervr/
• 원문
• PowerVR Series 6 기준
• Low Level 마지막 최적화용
• http://www.humus.name/Articles/Persson_LowLevelThinking.pdf
• 재원씨발표
• http://www.humus.name/Articles/Persson_LowlevelShaderOptimizat
ion.pdf
• 개발자가 놓칠 수 있는 저 수준 최적화

PowerVR Rogue architecture
• 셰이더를 실행하는 데 필요한 사이클 수에 따라 달라짐
• 아키텍처는 물론 구성에 따라 단일 사이클 내에서 USC (통합 쉐
이더 클러스터), ALU (산술 및 논리 유닛) 파이프 라인에서 여러
명령어를 실행하기위한 다양한 옵션을 제공
• 두 개의 F16 SOP (Sum of Products) 명령
• F32 - F16 변환 및 이동 / 출력 / 팩 명령을 모두 한 번에 실행
• FP32 / INT32 MAD / UNPACK 명령어, 테스트 (조건부) 명령어 및
move / output / pack 명령어와 함께 FP32 Multiply-Add 명령어
(MAD)를 한 사이클 만에 실행
• USC 코어를 활용하기 위해 MAD 형식으로 작성

USC(Unified Shader Cluster)
통합 쉐이더 클러스터
ALU (Arithmetic and Logic Unit)
산술 및 논리 유닛
SOP (Sum of Products)
Multiply-Add instruction (MAD)

MAD
• USC 코어를 가장 효과적으로 활용
• Multiply-Add 의 표현식
fragColor.x = (t.x + t.y) * (t.x - t.y); //2 cycles
{sop, sop, sopmov}
{sop, sop}
-->
fragColor.x = t.x * t.x + (-t.y * t.y); //1 cycle
{sop, sop}

Division
• 역수로 나누기를 작성하는 것
fragColor.x = (t.x * t.y + t.z) / t.x; //3
cycles
{sop, sop, sopmov}
{frcp}
{sop, sop}
-->
fragColor.x = t.y + t.z * (1.0 / t.x); //2
cycles
{frcp}
{sop, sop}

Sign
• -1 if x < 0, 1 if x > 0 를 sign으로 사용하는것 보다 조건식이
더 좋음
fragColor.x = sign(t.x) * t.y; //3 cycles
{mov, pck, tstgez, mov}
{sop, sop}
-->
fragColor.x = (t.x >= 0.0 ? 1.0 : -1.0) * t.y; //2 cycles
{sop, sop}

Rcp/rsqrt/sqrt
• sqrt(t.x) == t.x * inversesqrt(t.x); 같음
fragColor.x = sqrt(t.x) > 0.5 ? 0.5 : 1.0; //3 cycles
{frsq}
{frcp}
{mov, mov, pck, tstg, mov}
-->
fragColor.x = (t.x * inversesqrt(t.x)) > 0.5 ? 0.5 : 1.0; //2 cycles
{frsq}
{fmul, pck, tstg, mov}

Abs/Neg/Saturate
fragColor.xyz = -normalize(t.xyz); //6 cycles
{fmul, mov}
{fmad, mov}
{fmad, mov}
{frsq}
{fmul, fmul, mov, mov}
{fmul, mov}
-->
fragColor.xyz = normalize(-t.xyz); //7 cycles
{mov, mov, mov}
{fmul, mov}
{fmad, mov}
{fmad, mov}
{frsq}
{fmul, mov}
fragColor.x = abs(t.x * t.y); //2 cycles
{sop, sop}
{mov, mov, mov}
-->
fragColor.x = abs(t.x) * abs(t.y); //1 cycle
{sop, sop}
fragColor.x = -dot(t.xyz, t.yzx); //3 cycles
{sop, sop, sopmov}
{sop, sop}
{mov, mov, mov}
-->
fragColor.x = dot(-t.xyz, t.yzx); //2 cycles
{sop, sop, sopmov}
{sop, sop}
fragColor.x = 1.0 - clamp(t.x, 0.0, 1.0); //2 cycles
{sop, sop, sopmov}
{sop, sop}
-->
fragColor.x = clamp(1.0 - t.x, 0.0, 1.0); //1 cycle
{sop, sop}

Exp/Log
• PowerVR 에서는 2^n 명령어 지원
fragColor.x = exp2(t.x); //1 cycle
{fexp}
fragColor.x = log2(t.x); //1 cycle
{flog}
float pow( float x, float y )
{
return exp2(log2(x) * y); //3 cycles
{flog}
{sop, sop}
{fexp}
}

Sin/Cos/Sinh/Cosh
• 4 cycles 로 비용이 싸다
fragColor.x = sin(t.x); //4 cycles
{fred}
{fred}
{fsinc}
{fmul, mov} //+conditional 조건부
fragColor.x = sinh(t.x); //3 cycles
{fexp}
{sop, sop}

Asin/Acos/Atan/Degrees/Radians
• 하드웨어에서 지원 안함
• 매우 비용이 높다 사용하지 말자
fragColor.x = asin(t.x); //67 cycles
//USC code omitted due to length
fragColor.x = acos(t.x); //79 cycles
fragColor.x = atan(t.x); //12 cycles (lots of conditionals

Vector*Matrix
• 필요없는 계산은 하지말자
fragColor = t * m1; //4x4 matrix, 8 cycles
{mov}
{wdf}
{sop, sop, sopmov}
{sop, sop, sopmov}
{sop, sop}
{sop, sop, sopmov}
{sop, sop, sopmov}
{sop, sop}
fragColor.xyz = t.xyz * m2; //3x3 matrix, 4 cycles
{sop, sop, sopmov}
{sop, sop}
{sop, sop, sopmov}
{sop, sop}

Mixed Scalar/Vector math
• Normalize () / length () / distance
() / reflect () 등의 함수는 dot ()과
같은 함수를 많이 포함
• 두 작업에 공유 하위 표현식이 있다
는 것을 알면 사이클수를 줄일 수
있음
• 그러나 입력 순서가 허용하는 경우에
만 발생
fragColor.x = length(t-v); //7 cycles
fragColor.y = distance(v, t);
{sopmad, sopmad, sopmad, sopmad}
{sop, sop, sopmov}
{sop, sop, sopmov}
{sop, sop}
{frsq}
{frcp}
-->
fragColor.x = length(t-v); //9 cycles
fragColor.y = distance(t, v);
{mov}
{wdf}
{sop, sop, sopmov}
{sop, sop, sopmov}
{sop, sop}
{frsq}
{frcp}
{mov}

Operation grouping
• 스칼라연산은 그룹화가 중요
fragColor.xyz = t.xyz * t.x * t.y * t.wzx * t.z * t.w; //7 cycles
{sop, sop, sopmov}
{sop, sop, sopmov}
{sop, sop}
{sop, sop, sopmov}
{sop, sop}
{sop, sop, sopmov}
{sop, sop}
-->
fragColor.xyz = (t.x * t.y * t.z * t.w) * (t.xyz * t.wzx); //4 cycles
{sop, sop, sopmov}
{sop, sop, sopmov}
{sop, sop}
{sop, sop}

FP16 overview
• FP16 정밀도 및 변환
• 정밀도는 떨어져도 문제없음
• 최적화로 인해 변화가 있는지 없는지는 항상 체크
• 16-32비트 변환 비용이 무료
• FP16 SOP/MAD
• 잘 사용하면 단일 사이클로 만들수 있음 (1 cycle)

SOP/MAD FP16 pipeline
• 16-32비트 변환 비용이 무료
• SOP/MAD에서 다양한 사용법이 있음
4 MADs in one cycle.
fragColor.x = t.x * t.y + t.z;
fragColor.y = t.y * t.z + t.w;
fragColor.z = t.z * t.w + t.x;
fragColor.w = t.w * t.x + t.y;
//1 cycle
mediump vec4 fp16 = t;
highp vec4 res;
res.x = clamp(min(-fp16.y * abs(fp16.z), clamp(fp16.w, 0.0, 1.0) * abs(fp16.x)), 0.0, 1.0);
res.y = clamp(abs(fp16.w) * -fp16.z + clamp(fp16.x, 0.0, 1.0), 0.0, 1.0);
fragColor = res;
{sop, sop}

PowerVR Low Level GLSL Optimisation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie PowerVR Low Level GLSL Optimisation

Ähnlich wie PowerVR Low Level GLSL Optimisation (20)

Mehr von 민웅 이

Mehr von 민웅 이 (18)

PowerVR Low Level GLSL Optimisation

Hinweis der Redaktion