Question for Aras || Farfarer regarding hypothetical performance difference between implementations

IJM · Oct 23, 2014

I hope this is appropriate, and I think it will benefit many others to know.

(Unity DirectX 9)
In a situation where I have a NxN mesh plane, and I wish to move the vertices every frame.
Which of these two implementation is faster, how much faster is it, and why:
.

Passing the vertices and normals to a Mesh instance. (Ignore the time needed to calculate the normals, I can handle that very efficiently outside Mono)

Passing a height map to the shader as texture, and calculating normals inside the shader. (Assuming that I'll have to update the bitmap every frame as well)

Thank you very much, answer to this question would save me a lot of time.

p.s.
I prefer the first solution, since the shader can be much more straightforward.
p.p.s.
The more technical details you provide me with, the better.

Daniel_Brauer · Oct 23, 2014

Why don't you specify the actual parameters (texture/mesh size, actual target hardware) and run some tests? Aras and Farfarer (and everyone else on here, for that matter), are not computers. This is a question best answered by a computer.

Farfarer · Oct 23, 2014

I have no idea, not an expert, I just like messing with shaders

But yeah, Daniel's right; profile it and find out, I couldn't call it either way. You might even find there's a speed crossover point when N is above/below a certain value.

But if you're saying you can calculate the normals extremely efficiently, sounds like the better option? Really depends on the target platforms and what else is happening in the scene, whether you'll be CPU or GPU bound...

IJM · Oct 23, 2014

Farfarer, thanks for the feedback.
I wasn't looking for exact numbers, just an opinion, what should be faster.
Would like to get one from Aras as well, since he knows the internals. (What does the Unity rendering engine do every step of the way, in those two cases)

Farfarer · Oct 23, 2014

I don't think what Unity does internally will matter too much. You're not really going to be affecting it in a "negative" fashion either way (i.e. you won't be doing stuff that shouldn't be done, as it were).

IJM · Oct 23, 2014

Unfortunately, that's incorrect. I wrote few of my own engines, nothing too complicated. With very basic rendering engines, in OpenGL.

For example, when rendering a mesh you can use Display lists, and send the mesh data to the VRAM only once; benefit of that is the speed, and the disadvantage is that you can't modify vertices, normals or UVs after you have done that. Only thing that you can do is change the transformation matrix of an object.

If you wish to change the data, you have to send all of the data every frame, and that is much slower.

When using shaders that problem disappears, since they are executed on the GPU, so you can use Display lists and move the vertices "within the graphics card".
But still, you have to get that data somehow, and that's with textures.
If you want to move the vertices with the data from the texture that is changing every frame, you have to get that texture to the GPU somehow, every frame.

I'm pretty sure that Unity is using, at least for the OpenGL part, vertex buffer objects that are a completely different story.

The point is...
Whatever you are doing on the API level has a certain speed, and there are few ways of doing the same or similar thing;
Someone who is working on the engine itself usually knows the difference in speed.

p.s.
When thinking about this a bit more, I believe that vertex data is the correct way to go, since I will calculate the normals on the GPU with OpenCL.

Mikeysee · Oct 23, 2014

Does unity event have Vertex Texture Fetch? I remember looking into GPU based particle systems a while back and dismissed it because VTF wasnt there...

duke · Oct 26, 2014

Mikeysee said: ↑

Does unity event have Vertex Texture Fetch? I remember looking into GPU based particle systems a while back and dismissed it because VTF wasnt there...
Click to expand...

It most certainly does!

hippocoder · Oct 26, 2014

Pretty sure once you get enough verts that gpu solution (vertex shader) is tonnes faster but that depends on how much work you're going to do when you say "generate bitmap", so like others have said... profile.

drudiverse · Oct 29, 2014

if it's performance critical, i.e. 1000 vtcs per frame, the pc can't handle many 1000 vtcs rewrite per frame, it's about 50 percent of the processing of a 2010 desktop pc. so that's the mesh rewrite. the shader can do 100 times more than that per frame on a static mesh if the processor is idle and not doing any work, but rewriting the vtcs using cpu and then sending the info to shader already ties up the processor as much as just changing the mesh. the shader element of anything you do after the processor is pretty much free, 1 draw call per texture per shader is the same if the heightmap changes or if it is static i think. if its aiming 2000-10 000 vtx meshes definately aim for preprocessed heightmaps to a shader.

Oh yes, a very slow thing to send to the shader is a new mesh... that's why the limit is like 2000 vtx meshes per frame... it's the actual sending of the mesh to the shader takes a massive performance hit. you can send a 65k mesh to a shader once and mod it many times with textures, but you can only update a few thousand vertices per frame in the graphics card from teh processor staying over 25fps

incidentally, for very advanced answers to that question stack exchange can be very informative.

CHPedersen · Oct 30, 2014

I agree with everyone else that this is best tested on actual hardware.

It's really hard to say which one will be faster, because they're both limited by rather heavy utilization of the most precious resource the pipeline has: bus bandwidth.

Solution 1 spends it by sending updated vertex data every frame.
Solution 2 spends it by sending an updated texture every frame.

Which one is worse is impossible to tell without exact parameters, but this transmission of data across the bus is almost certainly the bottleneck in both cases. I think the best solution would be 2, if you could generate that height map texture on the GPU itself instead of doing it CPU side, perhaps by rendering it into a RenderTexture somehow. Then you could utilize the shader's ability to quickly move those vertices around, without having to pass textures to the shader.

Search Unity

Question for Aras || Farfarer regarding hypothetical performance difference between implementations

IJM

Daniel_Brauer

Unity Technologies

Farfarer

IJM

Farfarer

IJM

Mikeysee

duke

hippocoder

Digital Ape

drudiverse

CHPedersen

Search Unity

Unity ID

Useful Searches

Question for Aras || Farfarer regarding hypothetical performance difference between implementations

Unity Technologies

Digital Ape