I've been using heavy use of cutout transparency, is my performance doomed?

astracat111 · Jul 13, 2018

I've heard from multiple sources now that extensive use of cutout transparency has a higher cpu or gpu cost, I'm not quite sure I remember exactly what was said, but it was along the line of that the image would have to be drawn over twice or something.

At this stage in my game, there's really no way of going back from this, as I've modeled all of the trees and characters to make heavy use of transparency. I'm using as many optimization techniques as I can to reduce draw calls like billboarding trees, batching and combining objects.

I've designed my game to run in VR and I would ideally like to target the Oculus Go with some kind of cut down version. Due to the cutout transparency shader being depended upon so heavily, it's hard to actually bake lighting into the scene, so for a mobile version of the game I was thinking of using blob shadows entirely.

What I've found from a good amount of testing, being the noobie that I am, is that it's most all in dynamic lighting and shadows.

Here's an example comparison video of Drive Club VR vs Drive Club for the PS4:

I'm trying to take notes of this and it's inspired me to re-examine techniques that could be used for further optimization.

Any comments, help or whatever would be greatly appreciated.

hippocoder · Jul 13, 2018

Cutout is only a problem on mobiles. Are you deploying to mobile? did you test it? do not assume in a world where the hardware and engine constantly evolves.

AcidArrow · Jul 14, 2018

He did mention an Oculus Go, which more or less has mobile hardware, so there's a big chance Alpha Testing/Discard should be avoided.

But! Even if the "avoid alpha testing" advice is generally sound for mobile (and the opposite is true for most other platforms), every gpu is different. I looked a bit for info specific about the Oculus Go, but I didn't find too much.

So my advice is:

1. If you are targeting very specific devices, look into specific information for the specific hardware.

2. Test on an actual device. Maybe Alpha testing is the wrong way to go about it, but maybe the hardware is powerful enough to handle it, even if it's less than optimal. You don't want to completely restructure your game's look only to find out that it was performing fine anyway, right?

astracat111 · Jul 15, 2018

@AcidArrow

This is something that I don't understand....How did games in the 90s like Xenogears make such an intense use of cutout graphics and yet today this seems to be such a resource hog for mobile gpus? I guess they could have designed those mobile gpus to handle it as a feature but left it out? Is it just that the resolution was so low in older gpus that they could handle it?

This is why I was thinking maybe I could do what some PSVR game companies have been doing and lower the supersampling to be a scaled 720p then scale to the 1440p needed?

Otherwise, maybe there's a way in a modelling program I could somehow convert the pixel art to be 3D models? ...Yeah not sure of any solutions.

AcidArrow · Jul 15, 2018

About why it's slow, it's an architectural thing. I don't remember the exact details, a quick google-ing found me this:

For PowerVR hardware, and other GPUs that use tile-based rendering, using discard means that the TBR can no longer assume that every fragment drawn will become a pixel. This assumption is important because it allows the TBR to evaluate all the depths first, then only evaluate the fragment shaders for the top-most fragments. A sort of deferred rendering approach, except in hardware.
Click to expand...

I mean look it up for more accurate information, but if I remember it correctly, they sort of do depth evaluation before any frag shaders run. So it does the depth stuff, then later an alpha test shader says "actually for these pixels the depth is something else", so it has to go back and change the depth, which can be slow.

astracat111 · Jul 15, 2018

AcidArrow said: ↑

About why it's slow, it's an architectural thing. I don't remember the exact details, a quick google-ing found me this:

I mean look it up for more accurate information, but if I remember it correctly, they sort of do depth evaluation before any frag shaders run. So it does the depth stuff, then later an alpha test shader says "actually for these pixels the depth is something else", so it has to go back and change the depth, which can be slow.
Click to expand...

Very complex, but makes sense I guess. I didn't know all GPUs were built so differently internally.

AcidArrow · Jul 15, 2018

astracat111 said: ↑

Very complex, but makes sense I guess. I didn't know all GPUs were built so differently internally.
Click to expand...

I think it was since all early mobile GPUs were fillrate bound (and a lot of them still are), they thought if they did the depth first the could then do frag only for the visible pixels, saving on fillrate.

It's kinda insane, looking into this stuff, but there's usually a sensible reason and you can make some sense of it.

The really annoying stuff is usually when after spending hours looking why something works the way it does the answer is "there is a bug somewhere and a weird workaround somewhere else and the results are whatever".

astracat111 · Jul 15, 2018

AcidArrow said: ↑

I think it was since all early mobile GPUs were fillrate bound (and a lot of them still are), they thought if they did the depth first the could then do frag only for the visible pixels, saving on fillrate.

It's kinda insane, looking into this stuff, but there's usually a sensible reason and you can make some sense of it.

The really annoying stuff is usually when after spending hours looking why something works the way it does the answer is "there is a bug somewhere and a weird workaround somewhere else and the results are whatever".
Click to expand...

Lol, I see what you mean.

So, like, the Playstation 1's GPU was just designed to handle 2D so it could process sprite graphics in a more efficiently designed manner than say....a Snapdragon I guess. Well, as it's been stated, it's my Samsung S6 that had the trouble, while my laptop with an A8's GPU could handle it fine.

I suppose that it's not only rendering the 2D sprites, but the lighting and shadows for cutout graphics that are taking up all of the performance? I got much better performance when using blob shadows, but now due to graphics quality of all of these modern games I've started using bloom, depth of field and color grading. This stuff must take up way too much cpu for a mobile processor that's beneath an S8 or something.

I wonder how the Oculus Go does with all of this. It's disappointing to realize that it'd be better off to just develop for the Rift or get a cheap Mixed Reality headset for now...I see that Oculus Go store and I see opportunity right now, with it's cheaper price.

bgolus · Jul 16, 2018

Really it comes down to depth buffers. The depth buffer is how modern GPUs sort the pixels of opaque objects, and handle intersections of transparent objects with opaque objects. For each pixel of an opaque object that's rendered, a depth is also stored. When another object renders if the depth stored for each is closer to the camera than the object about to be rendered, it skips rendering those pixels. This cuts down on overdraw (rendering of the same pixel multiple times).

Like @AcidArrow noted, many mobile GPUs (including the Adreno in the Oculus Go) use some form of in hardware tiled deferred rendering which uses a depth only pre-pass to accelerate rendering. Doing a depth pre-pass entirely removes any chance of overdraw for opaque objects which would otherwise happen on complex objects or intersections. Being able to skip the fragment shader during this pre-pass speeds things up significantly, and is a factor. But this isn't the only problem. The depth buffer itself is usually in a compressed / optimized format that makes assumptions about all depth data being a constant plane across an entire triangle. When using a cutout shader those optimizations are disabled which makes that object more expensive to render, and all objects rendered after more expensive to render. This happens on the PC too btw, but those GPUs are so fast that it's not usually as obvious.

The PS1 had no depth buffer. Each tri was pre-sorted and rendered like sprites. Ultimately it didn't matter if a triangle was using a solid texture or not, and masked bits, clipping, or dithering were all first citizens of the GPU. Alpha blending / true transparency was possible, but expensive, and thus were generally avoided. This is significantly different from modern mobile GPUs where alpha blending is often preferable to alpha testing.

Search Unity

I've been using heavy use of cutout transparency, is my performance doomed?

astracat111

hippocoder

Digital Ape

AcidArrow

astracat111

AcidArrow

astracat111

AcidArrow

astracat111

bgolus

Search Unity

Unity ID

Useful Searches

I've been using heavy use of cutout transparency, is my performance doomed?

Digital Ape