Search Unity

Question for Aras || Farfarer regarding hypothetical performance difference between implementations

Discussion in 'Shaders' started by IJM, Oct 23, 2014.

  1. IJM

    IJM

    Joined:
    Aug 31, 2010
    Posts:
    143
    I hope this is appropriate, and I think it will benefit many others to know.

    (Unity DirectX 9)
    In a situation where I have a NxN mesh plane, and I wish to move the vertices every frame.
    Which of these two implementation is faster, how much faster is it, and why:
    .
    1. Passing the vertices and normals to a Mesh instance. (Ignore the time needed to calculate the normals, I can handle that very efficiently outside Mono)
    2. Passing a height map to the shader as texture, and calculating normals inside the shader. (Assuming that I'll have to update the bitmap every frame as well)

    Thank you very much, answer to this question would save me a lot of time.


    p.s.
    I prefer the first solution, since the shader can be much more straightforward.
    p.p.s.
    The more technical details you provide me with, the better.
     
  2. Daniel_Brauer

    Daniel_Brauer

    Unity Technologies

    Joined:
    Aug 11, 2006
    Posts:
    3,355
    Why don't you specify the actual parameters (texture/mesh size, actual target hardware) and run some tests? Aras and Farfarer (and everyone else on here, for that matter), are not computers. This is a question best answered by a computer.
     
  3. Farfarer

    Farfarer

    Joined:
    Aug 17, 2010
    Posts:
    2,249
    I have no idea, not an expert, I just like messing with shaders :p

    But yeah, Daniel's right; profile it and find out, I couldn't call it either way. You might even find there's a speed crossover point when N is above/below a certain value.

    But if you're saying you can calculate the normals extremely efficiently, sounds like the better option? Really depends on the target platforms and what else is happening in the scene, whether you'll be CPU or GPU bound...
     
  4. IJM

    IJM

    Joined:
    Aug 31, 2010
    Posts:
    143
    Farfarer, thanks for the feedback.
    I wasn't looking for exact numbers, just an opinion, what should be faster. :)
    Would like to get one from Aras as well, since he knows the internals. (What does the Unity rendering engine do every step of the way, in those two cases)
     
  5. Farfarer

    Farfarer

    Joined:
    Aug 17, 2010
    Posts:
    2,249
    I don't think what Unity does internally will matter too much. You're not really going to be affecting it in a "negative" fashion either way (i.e. you won't be doing stuff that shouldn't be done, as it were).
     
  6. IJM

    IJM

    Joined:
    Aug 31, 2010
    Posts:
    143
    Unfortunately, that's incorrect. I wrote few of my own engines, nothing too complicated. With very basic rendering engines, in OpenGL.

    For example, when rendering a mesh you can use Display lists, and send the mesh data to the VRAM only once; benefit of that is the speed, and the disadvantage is that you can't modify vertices, normals or UVs after you have done that. Only thing that you can do is change the transformation matrix of an object.

    If you wish to change the data, you have to send all of the data every frame, and that is much slower.

    When using shaders that problem disappears, since they are executed on the GPU, so you can use Display lists and move the vertices "within the graphics card". :)
    But still, you have to get that data somehow, and that's with textures.
    If you want to move the vertices with the data from the texture that is changing every frame, you have to get that texture to the GPU somehow, every frame.

    I'm pretty sure that Unity is using, at least for the OpenGL part, vertex buffer objects that are a completely different story.

    The point is...
    Whatever you are doing on the API level has a certain speed, and there are few ways of doing the same or similar thing;
    Someone who is working on the engine itself usually knows the difference in speed. :)

    p.s.
    When thinking about this a bit more, I believe that vertex data is the correct way to go, since I will calculate the normals on the GPU with OpenCL.
     
  7. Mikeysee

    Mikeysee

    Joined:
    Oct 14, 2013
    Posts:
    155
    Does unity event have Vertex Texture Fetch? I remember looking into GPU based particle systems a while back and dismissed it because VTF wasnt there...
     
  8. duke

    duke

    Joined:
    Jan 10, 2007
    Posts:
    763
    It most certainly does!
     
  9. hippocoder

    hippocoder

    Digital Ape

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Pretty sure once you get enough verts that gpu solution (vertex shader) is tonnes faster but that depends on how much work you're going to do when you say "generate bitmap", so like others have said... profile.
     
  10. drudiverse

    drudiverse

    Joined:
    May 16, 2013
    Posts:
    218
    if it's performance critical, i.e. 1000 vtcs per frame, the pc can't handle many 1000 vtcs rewrite per frame, it's about 50 percent of the processing of a 2010 desktop pc. so that's the mesh rewrite. the shader can do 100 times more than that per frame on a static mesh if the processor is idle and not doing any work, but rewriting the vtcs using cpu and then sending the info to shader already ties up the processor as much as just changing the mesh. the shader element of anything you do after the processor is pretty much free, 1 draw call per texture per shader is the same if the heightmap changes or if it is static i think. if its aiming 2000-10 000 vtx meshes definately aim for preprocessed heightmaps to a shader.

    Oh yes, a very slow thing to send to the shader is a new mesh... that's why the limit is like 2000 vtx meshes per frame... it's the actual sending of the mesh to the shader takes a massive performance hit. you can send a 65k mesh to a shader once and mod it many times with textures, but you can only update a few thousand vertices per frame in the graphics card from teh processor staying over 25fps

    incidentally, for very advanced answers to that question stack exchange can be very informative.
     
    Last edited: Oct 29, 2014
  11. CHPedersen

    CHPedersen

    Joined:
    Mar 2, 2011
    Posts:
    63
    I agree with everyone else that this is best tested on actual hardware.

    It's really hard to say which one will be faster, because they're both limited by rather heavy utilization of the most precious resource the pipeline has: bus bandwidth.

    Solution 1 spends it by sending updated vertex data every frame.
    Solution 2 spends it by sending an updated texture every frame.

    Which one is worse is impossible to tell without exact parameters, but this transmission of data across the bus is almost certainly the bottleneck in both cases. I think the best solution would be 2, if you could generate that height map texture on the GPU itself instead of doing it CPU side, perhaps by rendering it into a RenderTexture somehow. Then you could utilize the shader's ability to quickly move those vertices around, without having to pass textures to the shader.