Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Voting for the Unity Awards are OPEN! We’re looking to celebrate creators across games, industry, film, and many more categories. Cast your vote now for all categories
    Dismiss Notice
  3. Dismiss Notice

Call compute shaders from compute shaders

Discussion in 'Scripting' started by Guedez, Nov 24, 2018.

  1. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    823
    Context:
    I have grass that you can cut like in Legends of Zelda: Breath of the Wild.
    You can also build stuff where the grass used to be
    This will 'destroy' the grass.

    Destroying grass is executed by recalculating if there should be grass or not in the while 8x8x8 block.
    The issue arises when the total number of grass is different, the 'length' values for each blade of grass is not in the same index anymore, the only way I know to fix this is by doing a O(n*n) brute force that checks if the new grass exists in the old grass buffer, and if does, copy the length value.

    I intend to first organize the grass in 1x1x1 blocks, and then do the O(n*n) brute force in those blocks, instead of the 8x8x8 block.
    The issue lies in how to organize the grass by position. I'd like to try to do the whole thing in a single compute shader, but I am not sure how to go about it. The whole octree solution would probably force me to do tons of CPU<->GPU calls and queries for buffer sizes and whatnot, so here is the question:

    Is there a way to do the whole procedure in a single kernel, or have a kernel call a ton of other kernels? A way to dispatch shaders from shaders?
     
  2. Lurking-Ninja

    Lurking-Ninja

    Joined:
    Jan 20, 2015
    Posts:
    9,900
    I would move this question to the shader topic, you would have better chance to get answers, since there are people there who actually know something about shaders and it's more rare here since it's more of a game play scripting topic.

    https://forum.unity.com/forums/shaders.16/

    If you want to move your question, just hit the report on the bottom of your initial post and ask the moderator to move.
     
  3. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,466
    Try morton coordinate to avoid cpu to gpu back and forth?
     
  4. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    823
    Apparently not, since Compute Shaders are actually used(at least on my case) to offload expensive gameplay functions rather than making something look nice, which is what most people in the Shaders forum are attempting to do.

    After a quick look in Wikipedia, this seems to be great for datasets that have a fixed amount (for instance, you know that there is a single blade of grass every single position), but this is not the case for my data.
    The maximum amount of grass in each 8x8x8 tile is 16,777,216 but there is hardly ever more than 8,000. And since they were added by a compute shader, the order is almost completely random.
    To make matters worse, there is no way to assume where the 'top left back 1/8th' of the data set starts or ends, since it's impossible to predict it's size
     
  5. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,466
    Well then why it's used for sparse voxel octree, keyword being sparse, look for nvidia and paper about it on gpu
    Turns out I'm reading one that mention morton: https://serv.cusp.nyu.edu/~hvo/papers/sfc.pdf
    probably not relevant but has a case of "sparse" but spatially relevant data

    The thing is I'm reading a lot of paper who are all about bringing those incoherently produce data and gave them order with morton, I'm not a specialist though
     
  6. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    823
    Sounds like crazy mathmagic, this morton thing gives me plenty of ideas on how it can be used for arrays of data you already know how big they are, but it does not give me any idea on how to fix my current issue.

    I could sort both sets with morton (or any sort, like x + y* 256+ z * 65536), then I could solve it in O(n), but I can't use mult threading, and I also have no idea how to multi thread sorting on the GPU works

    I think I will:
    1: GPU: count how many blades of grass are in each 0.5*0.5*0.5 blocks (which would be written to a 4096 uint array)
    2: CPU: allocate the needed ComputeBuffers (maybe cache them)
    3: GPU: split the whole set in 4096 sets (please tell me AppendBuffer arrays are a thing, otherwise I will need to make in batches of 32 or something)
    4: (Optional) GPU?: sort each of these sets
    5: GPU: do each set linearly, but each set gets it's own GPU thread

    I might end up needing to make kernels for "0.5*0.5*0.5 blocks with [0,32) blades, 0.5*0.5*0.5 blocks with [32, 128), etc)

    Sounds like a terrible solution, but if I can make it work in less than 4 frames outside the main thread, it should be good enough, for now.
    Hopefully someone gives me a better solution

    I am even starting to think that I will get more performance doing this on the CPU instead of going through all of these hoops
     
  7. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,466
    I kind of misread the original post anyway, I'm sure there is a solution with morton BUT why destroy the blade of grass? In zelda they left small nub, why not shrink the blade and have constant performance?
     
  8. Lurking-Ninja

    Lurking-Ninja

    Joined:
    Jan 20, 2015
    Posts:
    9,900
    I admittedly don't know too much about shaders, so if I say stupid, please, ignore it. But AFAIK they usually solve this problem with texture maps, no? So wouldn't it be a solution for you to take the grass-size value or the grass-existence value from a texture map and just change the texture when you want to change the grass in certain areas? (Effectively just draw something on the texture)
     
    Guedez and neoshaman like this.
  9. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,466
    Also it looks like the removal problem is similar to what they do for occlusion culling, watch the horizon zero down talk on this subject, it also explain a bit more about how to handle teh gpu.

    Also morton code is design to have access that scale with gpu cache, hence why it's becoming such widely used in gpu. Since gpu are organized in group of thread core, generally multiple of 2^n, morton allow to query all relevant data in cache friendly linearity
     
  10. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    823
    Although I could just *not draw* or *keep it always cut*, the blades of grass below a barn will never, for the rest of the game, most likely, ever grow again. Might as well remove them. If it were not for the lenght/bend values of all the other blades of grass in the same tile, this feature would already be finished

    I've been procrastinating learning how occlusion culling, and even shadows, work for so long that I think I will never actually going and learning it

    I think I have misunderstood you, but you absolute mad man, you might actually have solved it.
    Each blade of grass can exist anywhere within 256x256y256z of a 8*8*8 tile.
    Since they most often only occupy a single plane of this 3D space, I only store:
    position uint(x, y, z, 0)
    length float
    bendDirection float
    bendStrenght float
    normalRotation float
    This usually uses up 256*256*1*(4*5) bytes of video memory (1.25 Mega)
    Storing this data in 3D would use up 256*256*256*(4*4) bytes whopping 256 mega of video memory, that's why i didn't do it.
    But a single 1x1x1 block would only take 16*16*16*(4*4), 64kb of memory.
    So all I need to do to solve this is simple:
    1 GPU: Query which of the 512 1x1x1 blocks within the 8x8x8 tile actually contain any grass (will be usually be between 12 and 28 of them)
    2: CPU: Order the 12-28 operations of
    2.a: go over all the old data, if it's within the current 1x1x1 current block, save the data in 3D
    2.b: go over all of the new data, if it's within the current 1x1x1 block, read from the 3D data

    2.a and 2.b does not requires me waiting any data from the GPU, that only happens once, when I check which blocks actually need to be saved in 3D

    If no better idea comes up, I will check if this one works tomorrow
     
    Last edited: Nov 25, 2018
  11. neoshaman

    neoshaman

    Joined:
    Feb 11, 2011
    Posts:
    6,466
    do whatever is best for you

    but just fpr completion here was the video
     
  12. Guedez

    Guedez

    Joined:
    Jun 1, 2012
    Posts:
    823
    Worked pretty nicely
    The FPS drops are actually a massive Editor Overhead problem I have, no idea which script I made is causing it

    https://my.mixtape.moe/lluhgo.webm