Search Unity

SIMD and optimizing my algorithm?

Discussion in 'Entity Component System' started by MintTree117, Mar 7, 2020.

  1. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Hello.

    I have a pah-smoothing algorithm which runs post-a*. It basically gets a line between each point in the path, and based on some value interpolates for n-times. Each time it checks al the grid cells within some radious, and returns false if any are unpassable.

    I need some help understanding if I can use SIMD here, and also as always any other performance tips. Because I am taking an almost 0.5 x hit in timing. Or any other LOS algorithms you may know of which you may think is better.

    Code (CSharp):
    1.         // Is there a line of sight between two points?
    2.         private bool LOS( int x1 , int y1 , int x2 , int y2 )
    3.         {
    4.             // Difference
    5.             int dx = x2 - x1;
    6.             int dy = y2 - y1;
    7.  
    8.             // The number of steps
    9.             int n = math.max( math.abs( dx ) , math.abs( dy ) );
    10.  
    11.             //UnityEngine.Debug.Log( "n " + n + " " + "dx " + dx + " " + "dy " + dy );
    12.  
    13.             // The width/height of the rectange to check bounds with
    14.             float r = 1f;
    15.  
    16.             // Used an int instead of a bool so I could do additions instead of branches?
    17.             // I think this is more performant
    18.             int lineOfSight = 0;
    19.  
    20.             // Step along the line
    21.             for ( int step = 1; step < n; step++ )
    22.             {
    23.                 // The interpolation amount
    24.                 float t = step / (float) n;
    25.  
    26.                 float x = math.abs( math.lerp( x1 , x2 , t ) );
    27.                 float y = math.abs( math.lerp( y1 , y2 , t ) );
    28.  
    29.                 //UnityEngine.Debug.Log("t " + t + " " + "step " + step + " " + "x " + x + " " + "y " + y );
    30.  
    31.                 // Convert a float rect to an int one, which will represent positions in a grid
    32.                 float4 rect = new float4(
    33.                     x - r ,
    34.                     y - r ,
    35.                     x + r ,
    36.                     y + r );
    37.  
    38.                 int4 grid = new int4(
    39.                     ( int ) math.floor( rect.x ) ,
    40.                     ( int ) math.floor( rect.y ) ,
    41.                     ( int ) math.ceil( rect.z ) ,
    42.                     ( int ) math.ceil( rect.w ) );
    43.  
    44.                 // Check if any cells are unpassable
    45.                 for ( int row = grid.y; row < grid.w; row++ )
    46.                 {
    47.                     for ( int col = grid.x; col < grid.z; col++ )
    48.                     {
    49.                         lineOfSight += pathNodeArray[ col + row * gridSize.x ].walkable;
    50.                     }
    51.                 }
    52.             }
    53.  
    54.             return lineOfSight == 0;
    55.         }
     
  2. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    There's one algorithm I know that is very fast, but it requires walkable to be stored in a bit array padded so that rows are a multiple of a specific power of two long. The exact power of two depends on how many bytes it takes to represent the row length and whether or not you are using AVX. Padded values should be considered walkable (it's weird, but trust me on this). Also, walkable cells should be false and non-walkable cells should be true.
    1) Creating a bounding box around the end points.
    2) Create an empty bit array the same dimensions as the walkable array, but clear to zero. We'll call this the mark array.
    3) Using a simd loop, for each cell, calculate if the distance from the cell to the line is less than 1.5 units (actually can be less if you know the grid corner overlap tolerances and slope tolerances), and if so, mark that cell in the mark array as true. Note that it is ok to spill over the left and right boundaries to get everything aligned. However, if you spill over, make sure you mask off those outside bits to zero in a postpass.
    4) Bitwise AND the walkable bit array and the mark array over the concerning region. If any result is not 0, then the line of sight is blocked. Otherwise if all results are 0, then line of sight is achieved.

    The hardest part is step 3. You will need to be clever about extracting bitmasks from simd comparisons, and combining that wwith bit reverses and shift operations to put the masks in the right spots.
     
  3. MintTree117

    MintTree117

    Joined:
    Dec 2, 2018
    Posts:
    340
    Reading this, it makes sense but it makes me realize there is quite alot I don't know. This will take me a while, but thanks. I will have to learn how to do proper SIMD loops and I don't even know what a bit-mask is.
     
  4. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
    temps12 and MintTree117 like this.
  5. DreamingImLatios

    DreamingImLatios

    Joined:
    Jun 3, 2017
    Posts:
    4,264
    You may not want to start here (or maybe you do), but I write guides on low-level Burst optimizations which may be helpful on your journey. Some of the concepts covered in this one are required for the solution I suggested: https://github.com/Dreaming381/Lati...ptimization Adventures/Find Pairs - Part 1.md
     
    Attatekjir and MintTree117 like this.