Search Unity

Fastest way of updating a mask texture.

Discussion in 'General Graphics' started by MadSwag, Jul 31, 2020.

  1. MadSwag

    MadSwag

    Joined:
    Oct 20, 2018
    Posts:
    4
    Hello,

    I like to start my questions with a bit of context but if you don't care about it that much, please jump directly into the capital letters TL;DR and read the bold text below it.


    I'm working in a game with a world map (similar to Risk or the Paradox games) where each tile or province is represented by a unique color that serves as it's own ID, eg: Color(255,0,0) is ID 1, Color(254,0,0) is ID 2 ...
    This method allows me to represent up to 256 * 256 * 256 different provinces in game (more if we consider alpha but three channels is already unbereable :) ).

    When a user touches a province, I use a lookup texture which stores the provinces shapes colored with their unique color. This texture helps to identify with which province the player is interacting with.

    Afterwards, with a CS I write to an Index Texture of 4096 x 4096 (which is equal to 256 x 256 x 256) in which each pixel maps to a given province and, between other purposes, allows map shaders to know if a province should be highlighted or to which faction a province belongs, between other dynamic data.

    Below I will explain my current procedure in updating such Index texture, I would like to note that, in a real case, I would keep the Index Texture with the smallest possible size to work for the current provinces and not keep such a massive texture, but the whole idea behind this is to push the limits for optimization and see what can be done.

    The update of my Index Texture is done right now in a CS where I feed a buffer with modified data that will update it. This buffer will be kept small (hardly expecting above 200 elements at any time) and include two integers per element with encoded data each one, including the pixel it's modifying. The CS, for each pixel of the texture, loops across this buffer and, if there is data for it, copies it and finishes.

    Now into the questions. I'm thinking of ways of making this process as performant as possible, but I lack some knowledge about CS inner workings and Unity's handling of them so maybe most of them are useless, I ask for help if possible:

    First thing would be to keep the texture small :) (but we aren't doing it in this example of course).

    Second one could be to execute the CS in small steps (three/four times per second at most), instead of each frame. I feel this may improve performance but I'm not completely sure. Does calling CS with Dispatch() blocks the whole thread and nothing else is executed afterwards until the CS ends, or does the thread keep working and synchronization is done at the end of the engine's update loop?
    Note that I'm never reading from the CS, only writing to it some small buffers.
    If it doesn't get blocked when Dispatch is called, maybe it's much better to keep calling it each frame instead to keep things updated and avoid any possible lag spike.

    Third option is to dispatch multiple CS, each one of them responsible of updating a smaller Index texture (for simplicity, instead of a 4096 x 4096 image, we are updating two 2048 x 2048 images, each one of them with their corresponding buffer (which would be smaller, although I feel this won't be the problem). Nevertheless, I'm interesting in knowing, would running multiple CS at the same be possible, or will the CPU thread in which are being called be blocked until the last one?

    Does anyone has any ideas on whether there is a better approach to keep this whole process well optimized. Or do you thing it will already run great (I have still not tested it as I have not had the time to get into coding it right now and I prefer to have all the doubts cleared before jumping into writing code).

    Thank you very much :)

    TL;DR

    I'm updating a big 4096 x 4096 texture (don't want to know why, dangerous stuff)
    . I use a CS that loops across a small buffer with around 200 elements in average, these elements being structs of two integers. This CS writes the data of any of these elements to a given pixel if a certain non important condition is met. I only write some small buffers to this CS, I never read from it.

    I want to know two things:


    Does calling CS with Dispatch() blocks the whole thread and nothing else is executed afterwards until the CS ends, or does the thread keep working and synchronization is done at the end of the engine's update loop?

    Can multiple CS be run at the same time, or will the execution thread in which Dispatch is called block until the first one is done.
     
    Last edited: Jul 31, 2020
  2. Neto_Kokku

    Neto_Kokku

    Joined:
    Feb 15, 2018
    Posts:
    1,751
    I'm pretty sure the Dispatch will not block the main thread. The command will be passed off to the GPU, which has it's own list of things to do. It will only really block if you try to read the data from a ComputeBuffer, in which case the main thread has to wait the GPU to finish anything it's doing before it can get to the data.

    As for dispatches running in parallel, it depends. There is no guarantee of this behavior, but GPUs can and will try to parallelize work if possible: different shader "cores" can execute different workloads if they don't have any data dependency among them. You can observe this if you debug your application using a low level vendor-specific GPU debugger from NVidia/AMD/Intel.

    The catch is: they usually can either be executing either compute or rasterization workloads. If the GPU supports async compute, you can call DispatchAsync() to have your CS execute in parallel along rasterization work. This was first introduced and is mostly supported by AMD GPUs, NVidia kinda struggled with this at first but got their act together with Pascal GPUs. I have no clue about how it works on Intel.
     
  3. MadSwag

    MadSwag

    Joined:
    Oct 20, 2018
    Posts:
    4
    Thank you very much for your reply Kokku, very informative! And didn't know about async dispatch, will read about it.

    Thanks again :)