Search Unity

String.memcpy() in profiler?

Discussion in 'Editor & General Support' started by Stephan-B, Jan 12, 2012.

  1. Stephan-B

    Stephan-B

    Joined:
    Feb 23, 2011
    Posts:
    2,269
    Never seen this pop in the profiler before... What is String.memcpy()? I am not using strings...
     
  2. justinlloyd

    justinlloyd

    Joined:
    Aug 5, 2010
    Posts:
    1,680
    Not at all?

    Strings in .NET are immutable, so even something as simple and as innocuous as:

    Code (csharp):
    1.  
    2. Debug.Log("The value of the variable is " + myVariable);
    3.  
    Generates a memory copy operation.

    There is also game object tags, name comparisons, repeat invoking, co-routines, and so on.

    And then again, it may just be a smallish glitch in the profiler that has shown that in the past, but you don't have sufficient other processing happening to swamp that value.

    How much CPU was the memcpy using?
     
  3. Stephan-B

    Stephan-B

    Joined:
    Feb 23, 2011
    Posts:
    2,269
    Does using the constructor of a Struct convert / use String.memcpy()?

    EDIT: If I remove Vector2 Location from the Struct ... String.memcpy() is not where to be found and function runs (deep profiling) in 1.76ms. If I add it back, String.memcpy() is back and function runs in 5.64ms.

    So why does using the Vector2 using this string.memcpy()? Is it just because Vector2 has a ToString() function in it?

    Code (csharp):
    1. public struct Cell
    2.  
    3. {
    4.  
    5.     public int x, y;
    6.  
    7.     public Vector2 Location;
    8.  
    9.     public byte Walls;
    10.  
    11.     public float DistanceToTarget;
    12.  
    13.  
    14.  
    15.     public Cell(int X, int Y)
    16.  
    17.     {
    18.  
    19.         x = X;
    20.  
    21.         y = Y;
    22.  
    23.         Location = new Vector2(X, Y);
    24.  
    25.         Walls = 0;
    26.  
    27.         DistanceToTarget = 0;
    28.  
    29.     }
    30.  
    31.  
    32.  
    33.     public Cell(Vector2 location)
    34.  
    35.     {
    36.  
    37.         x = (int)location.x;
    38.  
    39.         y = (int)location.y;
    40.  
    41.         Location = location;
    42.  
    43.         Walls = 0;
    44.  
    45.         DistanceToTarget = 0;
    46.  
    47.     }
    48.  
    49.  
    50.  
    51.     public Cell(int X, int Y, byte walls)
    52.  
    53.     {
    54.  
    55.         x = X;
    56.  
    57.         y = Y;
    58.  
    59.         Location = new Vector2(X, Y);
    60.  
    61.         Walls = walls;
    62.  
    63.         DistanceToTarget = 0;
    64.  
    65.     }
    66.  
    67. }
     
    Last edited: Jan 12, 2012
  4. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,401
    Code (csharp):
    1. public struct Cell
    2.  
    3. {
    4.  
    5.     public int x, y;
    6.  
    7.     public Vector2 Location;
    8.  
    Should you have a struct in a struct? I was under the impression that structs should contain base types only. Maybe you mean for Cell to be a class rather than a struct. I'm not clear why you have the location variable anyway when you already have x and y, which seem to be the same thing. It seems like a waste, especially in a struct. Likewise, do you really need a distance to target variable? Seems like something that should be computed as necessary.

    --Eric
     
  5. andorov

    andorov

    Joined:
    Feb 10, 2011
    Posts:
    1,061
    Depends on who you ask. MS best practice states that in the .NET environment, structs should be kept under 16 bytes and be immutable. Structs inside structs are not discouraged.

    Rarely have I seen people follow "best practice."

    I'm confused as to why this particular struct will cause additional String.memcpy operations. Are you sure you aren't implicitly converting it to a string somewhere?
     
    Last edited: Jan 13, 2012
  6. Stephan-B

    Stephan-B

    Joined:
    Feb 23, 2011
    Posts:
    2,269
    Not sure if you all posted before I made the EDIT.

    Looks like Vector2 is causing this String.memcpy().

    In terms of having this Vector2 Location when I have X and Y, it was purely for convenience but apparently that came with a price.

    With respect to storing DistanceToTarget, after an area is created, I have a function returning a list of cells with their relative distance to a target which never changes so it avoided having to recalculate the distance.

    Right now, I was just experimenting in generating a path to a target. Learning / Trying difference ways when I ran into this String.memcpy() thing.
     
  7. andorov

    andorov

    Joined:
    Feb 10, 2011
    Posts:
    1,061
    Are you sure the Vector2 isn't being auto-serialized and displayed somewhere, like an inspector?
     
  8. Eric5h5

    Eric5h5

    Volunteer Moderator Moderator

    Joined:
    Jul 19, 2006
    Posts:
    32,401
    I checked it out, and merely using a struct in a struct causes this String.memcpy() thing. Not just Vector2, but any struct, including a custom bare-bones one. If you use a class instead of a struct, it doesn't happen. Not sure what it means, or if it should happen, but probably staying away from structs in structs is a good idea for now.

    --Eric
     
    halley and forestrf like this.
  9. Stephan-B

    Stephan-B

    Joined:
    Feb 23, 2011
    Posts:
    2,269
    Thank you! :)
     
  10. Smooth-P

    Smooth-P

    Joined:
    Sep 15, 2012
    Posts:
    214
    Any word on this from someone with knowledge on the guts and inner workings that cause this to happen?

    I am using structs in structs in a perfectly "reasonable" way in the world of horrendously bad GC (generic structs in generic structs, actually, and if it wasn't for the GC, I'd just use objects) and am seeing tons of these string function calls. They don't take up much time compared to the actual work that is being done, but why they are there at all is quite troubling.

    Unity 4.3.4 on OSX, btw.
     
  11. Smooth-P

    Smooth-P

    Joined:
    Sep 15, 2012
    Posts:
    214
    Pure speculation:

    I'm figuring the memcpy calls are a "worse is better" implementation detail that the Mono guys did in the early days when they were just trying to get a working runtime out the door. Certainly you'd rather copy bytes without a method call, but these probably get inlined on a JIT-able RT anyway. All the method calls could potentially be sapping performance on iOS though.

    And I'm sure there is tons of worse is better code in the ancient Unity Mono build.
     
  12. cecarlsen

    cecarlsen

    Joined:
    Jun 30, 2006
    Posts:
    864
    Reviving this old thread because I encountered the same String.memcpy in the profiler in Unity 2018.3. For me it had nothing to do with structs inside structs. I just had to keep the struct size no bigger than 40 bytes.
     
    dannyalgorithmic likes this.
  13. LaireonGames

    LaireonGames

    Joined:
    Nov 16, 2013
    Posts:
    705
    I am seeing the same thing and I have a feeling its to do with Matrix4x4, at least in my case
     
  14. imtehQ

    imtehQ

    Joined:
    Feb 18, 2013
    Posts:
    232
    Can you give a example why you think it has something to do with Matrix4x4 ?
    I mean i got it to on code that runs 100 calls in 0.45 ms (0.08 self) where 0.30 ms (0.05 self) is caused by this.
    I dont got it on the code where i do anything with Matrix data.
     
  15. LaireonGames

    LaireonGames

    Joined:
    Nov 16, 2013
    Posts:
    705
    Its because of where it was appearing. So this is the profiling sample that was showing the hit for me:


    Code (CSharp):
    1.    if (fromInstant)
    2.                         UnityEngine.Profiling.Profiler.BeginSample("Bending");
    3.  
    4.                     if (block.basicBending)
    5.                     {
    6.                         bendingAxis.Add(new Vector2(1, 0));
    7.                         bendingCenters.Add(new Vector3(x, y - 0.5f, z));
    8.                     }
    9.                     else
    10.                     {
    11.                         bendingCenters.Add(directionMatrix.MultiplyPoint3x4(block.bendingData.pivotOffset[i / block.bendingData.verticesPerSection]));//bendingCenters.Add(new Vector3(x, y - 0.5f, z) + offset + data.pivotOffset[i / data.verticesPerSection]);
    12.  
    13.                         Matrix4x4 bendingMatrix = Matrix4x4.TRS(LairMaths.zero, Quaternion.AngleAxis(rotationDegrees, LairMaths.up), LairMaths.one);//so rotate the axis only
    14.                         Vector3 axis = new Vector3(block.bendingData.axis[i / block.bendingData.verticesPerSection].x, 0, block.bendingData.axis[i / block.bendingData.verticesPerSection].y);//expand the data
    15.  
    16.                         axis = bendingMatrix.MultiplyPoint3x4(axis);//rotate
    17.  
    18.                         bendingAxis.Add(new Vector2(axis.x, axis.z));//store
    19.                     }
    20.  
    21.                     if (fromInstant)
    22.                         UnityEngine.Profiling.Profiler.EndSample();
    The main thing that is going on here is the Matrices. I doubt its adding to the lists since the rest of my code is heavy in the same operations and wasn't seeing the spike.

    Edit: also pretty sure the majority of the time this code was running the basic bending flag was false since I was also seeing the hit for the matrix math in general
     
  16. imtehQ

    imtehQ

    Joined:
    Feb 18, 2013
    Posts:
    232
    I got it a where i use a name for a set of data in a class that in a list that a grab and make a instance of,
    Will try to see if it helps if i remove it tomorrow.
     
    LaireonGames likes this.
  17. imtehQ

    imtehQ

    Joined:
    Feb 18, 2013
    Posts:
    232
    Did sadly not get the string out of my class because its used to much troughout the program so far, i think i just got to live with it.
     
  18. Darkgaze

    Darkgaze

    Joined:
    Apr 3, 2017
    Posts:
    397
    Same here. Lots of string.memcpy() on just a foreach() loop, x2 times each time I start each loop.

    This must be a problem with data copy in structs. I removed the error by avoiding accessing a struct on an array by doing

    var value = array
    I used array everywhere instead, and it was faster and no String.memcpy()
     
  19. LaireonGames

    LaireonGames

    Joined:
    Nov 16, 2013
    Posts:
    705
    You probably saw the speed increase because of things covered in this blog:

    https://jacksondunstan.com/articles/5131

    Structs have the potential to perform better if used properly but it of course all depends on your data:

    https://jacksondunstan.com/articles/3860

    If your only using 1 array then sure your arrays will be faster, but you could beat arrays if you change your structure to be more tightly controlled for the cache access (So if your struct has only 3 values it could be faster than looking up 3 different arrays, if your struct has 20 or so values its likely slower).
     
  20. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Sorry to bump on old thread, but what exactly is this issue? I did some profiling and found some strange results.

    Here is my test script:
    https://gist.github.com/prodigga/d2f77012678535bd38cf60817f164886

    Here are some benchmarks:
    • Using Struct
      • Array of values
        • Test1 (Local Copy) 83ms
        • Test2 (Direct access) 19ms
      • List of values
        • Test1 (Local Copy) 126ms
        • Test2 (Direct access) 88ms
    • Using Class
      • Array of values
        • Test1 (Local Copy) 19ms
        • Test2 (Direct access) 19ms
      • List of values
        • Test1 (Local Copy) 24ms
        • Test2 (Direct access)24ms
    Using structs is immensely slower, and it seems to be related to the size of the struct. The bigger your struct, the worse the performance.

    However, for some reason, structs perform just as well as classes if you are using arrays and you are using the value 'directly'... That is,
    myArray[i].property
    instead of
    var element = myArray[i]; element.property...
    .

    If I reduce the member count down to just a single Vector3, then the performance is similar between structs/classes. But that's pretty restricting. I am doing some processing on some polygons, and for each vertex I am storing position, normal, tangent and bitangent for later use. This is making my code run much slower compared to just using classes. This doesnt feel right. Is this.. 'expected' behaviour.. ? Are structs really supposed to be this much slower?

    Benchmark reveals that the slowdown is due to String.memcpy.
     
    Last edited: Sep 22, 2019
  21. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Any assistance on this would need much appreciated.
     
  22. Peter77

    Peter77

    QA Jesus

    Joined:
    Jun 12, 2013
    Posts:
    6,619
    If you access a ValueType through a List<>, the value most likely gets copied. Accessing the List[] indexer is
    a function call that returns the value from an array:
    https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,172

    The bigger the struct, the more memory needs to be copied. Copying more memory costs more than copying fewer or no memory. duh! :)

    If you access it through an array, which is part of the C# language specification, I would assume the compiler uses this knowledge to remove the memcpy and generate code that accesses the memory address instead, much like a ReferenceType would do. In this case, there is little to no copying performed. You can check this is you look at the generated IL and/or disassembly.

    The beneficial thing about an array of structs is that all its data is stored in memory sequentially. If you use an array of classes, its elements just point to different memory addresses and the location where they point to, could be at "random memory addresses".

    Why this might not be ideal, is explained in the links below...

    https://jacksondunstan.com/articles/3399

    https://jacksondunstan.com/articles/3860

    https://jacksondunstan.com/articles/5131

    Have you profiled your test with IL2CPP too?
     
  23. Prodigga

    Prodigga

    Joined:
    Apr 13, 2011
    Posts:
    1,123
    Thanks for the reply.

    Here is my setup..:
    • I have an editor only Bezier curve tool. When it changes, I bake down all the points of the curve into a list for later reference.
    • I Iterate over these points and do something with them in other scripts.
    • To ensure that the baked points cannot be modified 'from the outside'. I have the bezier curve return an
      IReadOnlyList<Point> 
      , and making
      Point
      's a struct. This way, other scripts can read the 'baked point' list, but no one can accidentally modify the list.
    This works beautifully. However, reading from the list of points incurs a cost (in the form of String.memcpy()).

    If I turn the Point struct into a class, I'd need to do something like have the Point class implement IReadOnlyPoint, and return a ReadOnly list of ReadOnly Points. But this makes code clunky to author. Some methods use Points, other methods take in ReadOnlyPoints. It is a mess. This is a non-issue with structs.

    It is really surprising to me that using structs can be so slow. I will have to do a 'pure c#' test to check for myself that this isn't a mono/unity specific quirk, as I am having a hard time accepting that this is just the way it is! Hence why I was wondering if the performance impact have anything to do with Unity's version of mono?

    As this is for editor-specific code, I can't check how this performs with IL2CPP. This code won't ever reach a build. It is for editor tooling.

    This isn't too much of a consideration for me as in my use case 500,000 is the upper limit of what I am ever expecting to see, and in my results above an array of structs and an array of class instances performed similarly well. (Which is still strange, to be honest. I thought the sequential memory access might've lead to a bigger gap in performance, even with only 500,000 elements.).
     
  24. friuns3

    friuns3

    Joined:
    Oct 30, 2009
    Posts:
    307
    lol i got similar results to Prodigga
    another reason to avoid unity's dots, in 90% cases its gonna be slower, or you have to be ninja to get everything right...

    but there only one test run little bit faster, is when shuffle list before enumerating it then structs beats classes
     
  25. Nyanpas

    Nyanpas

    Joined:
    Dec 29, 2016
    Posts:
    406
    It seems whatever I do, using structs calls string.memcpy(), even though the struct only holds ints, floats, and bools. This is Unity 2017.2.17f1 both Pro/Personal.

    What am I doing wrong?

    [edit] Seems it was a little over 32 bytes in size, so I removed the float2s and only kept references to the vertices by int indexes. I wasn't sure of how much space in bytes float2s would take.

    No more string.memcpy() now. uwu
     
    Last edited: Feb 3, 2020
  26. tim12332000

    tim12332000

    Joined:
    Jun 15, 2017
    Posts:
    20
    upload_2020-9-21_14-48-57.png
    Share this case. it's happend on UGUI Text Outline. o_O
     
  27. BorisTheBrave

    BorisTheBrave

    Joined:
    Nov 18, 2018
    Posts:
    64
    So, there seems to be some confusion in this thread, so let me spell out how it works based on my own experiments.

    1) string.memcpy appears whenever you copy large bitwise-copyable structs. A struct is large if it is greater than 40 bytes. A struct is bitwise-copyable if contains no reference types (it can contain nested bitwise-copyable structs). string.memcpy is the fastest way to copy a contiguous series of bytes, it has nothing to do with your use of C# strings.

    2) Matrix4x4 is a large bitwise-copyable struct, it's where you're most likely to see this.

    3) Mono's compiler / jit does not optimize away large struct copies. That means that these two snippets can give wildly different runtimes.
    Code (CSharp):
    1. // Copies the entire struct
    2. MyLargeStruct temp = array[i];
    3. return temp.x;
    4. // Reads value directly from struct
    5. return array[i].x;
    In a simple micro benchmark (Unity 2019.3.13f1, Mono, play-mode, profiler off), I measured the performance of these as 120ms vs 7ms! In a built exe, it was 26ms vs 2ms, so still signfiicant.


    What can I do about it?

    Try one of the following:

    1) Avoid temporary variables for large structs, or make them reference locals.

    2) Add a dummy reference property to your struct (though this didn't help in built mode, inexplicably making the editor faster than a built exe).

    3) Keep your structs at or below 40 bytes. Maybe have a struct of arrays rather than an array of structs?

    4) Use classes. Classes are never implicitly copied, so won't suffer this particular performance issue.

    5) Use Burst (or IL2CPP). I haven't tested, but these surely don't have the same issue.
     
    Last edited: Nov 28, 2020
  28. archo5dev

    archo5dev

    Joined:
    Oct 25, 2018
    Posts:
    2
    Something I've come across in my research is that things are actually much worse than they may initially seem.

    I was debugging horrible Matrix4x4 multiply deep profiling performance (particularly on Android) with 2019.4 and found out that the entire struct is copied for every field read. This seems to apply to non-ref arguments and locals only.



    You can even see this in at least some IL2CPP outputs as follows (search for
    "Matrix4x4_op_Multiply_" in Temp/StagingArea/Il2Cpp/il2cppOutput/UnityEngine.CoreModule.cpp).

    Code (CSharp):
    1.  
    2.        Matrix4x4_tDE7FF4F2E2EA284F6EFE00D627789D0E5B8B4461  L_128 = ___lhs0; // a full struct copy
    3.        float L_129 = L_128.get_m20_2();
    4.  
    I believe Clang is capable of optimizing these out however, so the only issue as far as IL2CPP goes is wasting the compiler's time.

    This can be worked around by adding "in" to each parameter.

    In the case of the Matrix4x4 multiplication, the original implementation can be found here: https://github.com/Unity-Technologi.../2019.4/Runtime/Export/Math/Matrix4x4.cs#L183 - the parameter change should reduce the number of copies to 1 in this case.

    On a related note, "this" parameters do not seem to have this issue since they are passed as a reference (basically the same as a ref/in/out parameter).
     
    Last edited: Dec 1, 2021
    cxode, Peter77 and mgear like this.
  29. Darkgaze

    Darkgaze

    Joined:
    Apr 3, 2017
    Posts:
    397
    Just a quite unrelated comment: If you want performance, use the Mathematics Package + Burst, which will generate SIMD operations with the float4 and float2 and float4x4 variables and it can be 10x faster :)