Im not a total newbie when it comes to shaders, however for the past week ive been struggling with the water shader performance ive been working on. There doesnt seem to be anything heavy going on in the pixel shader, yet it takes about 3ms on my 1050ti to render at 1080p. Shader also uses tesselation and vertex processing, but even if i disable tesselation and bypass vertex processing its still takes the same amount to render. Ive found this presentation (http://developer.download.nvidia.co...vents/cgdc15/CGDC2015_ocean_simulation_en.pdf), where they have even more features in their water shader, yet it renders in 0.5ms(!) on gtx770 (which is just slightly faster than 1050ti), so theres definetly something wrong with my code somewhere (although they dont mention the resolution for which those timings were taken for, so i might be actually wrong and its adequate performance for my GPU). Heres the relevant pixel processing code: Code (CSharp): void Frag(v2f input, out float4 outColor : SV_Target0) { outColor = 1; const float3 geomTangentWS = normalize(input.tangentWS); const float3 geomBinormalWS = normalize(input.binormalWS); const float3 geomNormalWS = normalize(cross(geomBinormalWS, geomTangentWS)); const float3x3 local2WorldTranspose = float3x3( geomTangentWS, geomBinormalWS, geomNormalWS ); const float3 blendedPos = (input.positionWSForUV + input.positionWS)*0.5; const float3 V = GetWorldSpaceNormalizeViewDir(input.positionRWS); float2 normalVelocity = 0; const float3 normalRaw = GetCombinedNormals(blendedPos, V, geomNormalWS, normalVelocity); const float3 normal = ScaleNormalizeNormal(normalRaw, 0.12*_NormalStrength); const float3 normalWS = TransformNormal(normal, local2WorldTranspose); const float sceneDepth = LinearEyeDepth(SampleCameraDepth((input.screenPosition.xy+normalWS.xz*0.18)/input.screenPosition.w), _ZBufferParams); const float depthDiff = sceneDepth-input.position.w; const float depthNormalCurl = max(0, 1-exp2(-max(0, depthDiff)*4)); const int probeIndexFixed = abs(_PlanarReflectionProbeIndex) - 1; const float3 R = reflect (V, normalize(lerp(normalWS, normalize(float3(normalWS.x, -9.0, normalWS.z)), 1-depthNormalCurl))); const float3 reflPos = (input.positionWSForUV + 9999999 * R); const float3 ndc = ComputeNormalizedDeviceCoordinatesWithZ(reflPos, _Env2DCaptureVP[probeIndexFixed]); const float distReflRough = 1-exp(-length(input.positionRWS.xyz)*0.0031); const float3 reflection = SAMPLE_TEXTURE2D_ARRAY_LOD(_Env2DTextures, s_trilinear_clamp_sampler, ndc.xy, probeIndexFixed, distReflRough*8).rgb * GetCurrentExposureMultiplier() * _ReflectionBoost; const float fresnel = pow(saturate(1-max(0, dot(V, normalWS))), _FresnelPower)*0.98+0.02; const float depthFade = max(0, 1-exp2(-max(0, depthDiff)*_DepthFactor*0.8)); const float depthFadeDist = max(0, 1-exp2(-max(0, depthDiff)*1)); const float3 sceneColorA = SampleCameraColor((input.screenPosition.xy+normalWS.xz*0.28*1*depthFadeDist)/input.screenPosition.w, 0) * float3(0.0,0.3,1); const float3 sceneColorB = SampleCameraColor((input.screenPosition.xy+normalWS.xz*0.28*1.8*depthFadeDist)/input.screenPosition.w, 0) * float3(1.0,0.7,0); const float depthFadeLightFilter = max(0, 1-exp2(-max(0, depthDiff)*_DepthFactor*3.5)); const float3 sceneColor = lerp(sceneColorA + sceneColorB, (sceneColorA+sceneColorB)*(0.2+normalize(_SubsurfaceColor.rgb)*0.8), depthFadeLightFilter); const float3 diffuse = input.albedo*input.diffuse; const float3 refraction = lerp(sceneColor, diffuse, depthFade); outColor.rgb = lerp(refraction, reflection, fresnel); uint2 tileIndex = uint2(input.position.xy) / GetTileSize(); PositionInputs posInput = GetPositionInput_Stereo(input.position.xy, _ScreenSize.zw, input.position.z, input.position.w, input.positionRWS.xyz, tileIndex, unity_StereoEyeIndex); float4 fog = EvaluateAtmosphericScattering(posInput, V); outColor.rgb = outColor.rgb * saturate((1 - fog.a) + (1-depthFade)) + fog.rgb * depthFade; } And heres PIX analysis of this shader/drawcall (i honestly dont know how to interpret that - good? bad?): GetCombinedNormals just samples and unpacks ONE normal, nothing fancy there. And its for HDRP, so it uses some macros' from HDRP, like EvaluateAtmosphericScattering, SampleCameraColor, ComputeNormalizedDeviceCoordinatesWithZ etc. But those are not the source of the problem. Im pretty much on a verge of mental breakdown, please help
Thanks. Removed pretty much all of the normalizes, except one. Slightly better (-0.4 ms from the previous one), but still 2.8 ms
Even removing all the texture reads just knocks it down to 2ms. Im absolutely baffled by this, why is it performing so poorly.