Search Unity

Search

Garbage Collection, Allocations, and Third Party Assets in the Asset Store

Discussion in 'General Discussion' started by Games-Foundry, Jun 21, 2012.

Page 3 of 7

half_voxel

Joined:

Oct 20, 2007

Posts:

978
@gamesfoundry

Due to JIT compilation, may I suggest that you place your tests inside a small for loop to minimize effects of the JIT

Code (csharp):

for (int i=0;i<10;i++) {

//Do Tests Here

//Log test result

}

The first iteration usually takes around 2 to 10 times longer to run due to JIT compilation and may skew the results.
half_voxel, Jun 28, 2012

#101
half_voxel

Joined:

Oct 20, 2007

Posts:

978

Also, in your test I think it would be more appropriate if you stored the GolderRetriever as a Dog object instead of casting it every time. That would mimic how a Unity event system would do it (probably store things as MonoBehaviour). I don't think it would cause much of a difference in time... but that's what profiling is for, isn't it? to measure that difference.

We must also remember that while the override based system may seem faster. Unity will need to call the override on every single MonoBehaviour, even if they don't need to know that the event has occurred. I would say that at least 90% of all calls would not be needed in such as situation. whereas an event system driven by delegates (System.Action for example) will use no CPU time at all if no script has registered to receive the event (well, except for an "is null?" check).

Unity does seem to use a more sophisticated system than just overrides since when you write a function, for example OnTriggerEnter, it does not use the override keyword. Perhaps they are using reflection in some tricky way to increase performance... or are they using Invoke("OnTriggerEnter")?... I really hope they do not use the latter one.
Can anyone enlighten me on that issue?

Sorry for being a bit off topic.

Last edited: Jun 28, 2012

half_voxel, Jun 28, 2012

#102
Arowx

Joined:

Nov 12, 2009

Posts:

8,194

This is great for micro tweaking and caching but really misses the point of th 80:20 rule.

80% of the time your code will be fine.

20% of the time you will need to optimise your code, and basically you should only spend time where it is needed, after you have found slowdowns and used the profiler to locate the bottlenecks.

Arowx, Jun 28, 2012

#103
tatoforever

Joined:

Apr 16, 2009

Posts:

4,369

Arowx said: ↑

This is great for micro tweaking and caching but really misses the point of th 80:20 rule.

80% of the time your code will be fine.

20% of the time you will need to optimise your code, and basically you should only spend time where it is needed, after you have found slowdowns and used the profiler to locate the bottlenecks.
Click to expand...

Is the opposite that is true.
20% of your time, your code will be fine.
The remaining 80% of your time, your code will need to tweaked, optimized, adjusted.

tatoforever, Jun 28, 2012

#104
half_voxel

Joined:

Oct 20, 2007

Posts:

978

tatoforever said: ↑

Is the opposite that is true.
20% of your time, your code will be fine.
The remaining 80% of your time, your code will need to tweaked, optimized, adjusted.

Click to expand...

True (yes, I'm also pessimistic ), but maybe not the optimizations we are talking about here (except perhaps pooling). Optimizations almost always need to be done, but when you need measure stuff very carefully with millisecond accuracy and start thinking about if you should use a member variable or a static variable for performance reasons... then you are far away from the usual optimizations you need to make.

half_voxel, Jun 28, 2012

#105
jasonkaler

Joined:

Feb 14, 2011

Posts:

242

gamesfoundry said: ↑

Outcome:
Inheritance-based overriding appears at first glance to be consistently quicker than self-targeted events.

Event-based calling 145ms
Override-based calling 80ms
Click to expand...

The difference will probably be less drastic if you add a few parameters to the event, as is often the case.

jasonkaler, Jun 29, 2012

#106
Games-Foundry

Joined:

May 19, 2011

Posts:

632

JasonKaler said: ↑

The difference will probably be less drastic if you add a few parameters to the event, as is often the case.
Click to expand...

No difference. In the original test ( prior to the code posted ) I passed the object in the event. The results were the same.

Games-Foundry, Jun 29, 2012

#107
Games-Foundry

Joined:

May 19, 2011

Posts:

632

Thanks to MattCarr for posting up a performance graph for his Advanced Surface Shaders volumetric shader. Hopefully we can encourage other asset store publishers to follow suit and maybe one day it will become expected practice.

Games-Foundry, Jun 29, 2012

#108
Arowx

Joined:

Nov 12, 2009

Posts:

8,194

gamesfoundry said: ↑

Thanks to MattCarr for posting up a performance graph for his Advanced Surface Shaders volumetric shader. Hopefully we can encourage other asset store publishers to follow suit and maybe one day it will become expected practice.
Click to expand...

But wouldn't this be completely dependent on the performance of the platform the asset is running on! :???:

tatoforever said: ↑

Is the opposite that is true.
20% of your time, your code will be fine.
The remaining 80% of your time, your code will need to tweaked, optimized, adjusted.

Click to expand...

Maybe that says more about your code than mine

Arowx, Jun 29, 2012

#109
fholm

Joined:

Aug 20, 2011

Posts:

2,052
So much miss-information and assumptions in this thread, anyone experienced with either the .NET or Mono runtimes will know these things, but I assume that not many people using Unity come from a .NET background. Anyway, in no specific order:

For all intents and purposes value types (structs, primitives) are never put on the heap, yes they will effect the size of a reference type (class) if they exists inside of it as a field. There are exceptions to this, but they're rare and have to do with boxing and interfaces.

The reason it's faster to recycle objects then to create new ones is not because of the constructor call, but rather that it lessens the pressure on the GC.

Local variables are faster then members on reference types because they are allocated on the stack instead of the heap.

The reason a "for ..." is faster then a "foreach ..." is because foreach uses the built-in IEnumerable interface in .NET/Mono which causes a couple of extra method calls. However, using "foreach ..." on an array should result in the compiler inserting a "for ..." loop instead (not 100% sure on if this is true on Mono).

Events are slower then overloading because the event system lies on-top of the Delegate/MulticastDelegate sub-system of .NET, and while the delegates themselves end up being a .virtcall instruction there is added ceremony around invoking a delegate which costs performance.

Obviously calling "new Vector3()" is more expensive then re-using already initialized memory, but that has nothing to do with the GC and is simple because it costs to invoke the constructor of the struct.

Passing reference types into methods instead of large structs in general do not effect performance as whatever tiny gain (which I doubt is even measurable) you would gain from allocating less stack space for the method call is offset by the cost of the GC when it has to clean up the reference object.

Even when talking about the MS CLR, while they state that value types can be stored on the heap, it's always in connection to a reference object. When you do "Vector3 v = new Vector3();" that value will always be on th stack.

The complaints on some of the Unity methods allocating memory are generally invalid, as most of the time it's because they return an array of objects, and the array needs to be allocated on the heap.

Replacing a List<T> with a LinkedList<T> and using it in the same way as you would a list will lead to more allocations.

I think that's it, g'nite!
fholm, Jun 29, 2012

#110
Games-Foundry

Joined:

May 19, 2011

Posts:

632

Arowx said: ↑

But wouldn't this be completely dependent on the performance of the platform the asset is running on! :???:
Click to expand...

Yes, the timers would vary so is only really a sanity check. e.g. a process showing 10ms per frame would warrant further investigation, whereas 0.1ms may not, depending on what the function does. In the context of feeding the GC, the allocations for just a single object will be the more important metric. Some awareness of platforms and an ability to interpolate the test results to be meaningful for your own project will be required, but you'll probably have that if you are concerned about timings and allocation. Silly example: if you plan on using a 0.1ms per object function ( as tested on a desktop ) on a million objects per frame on iOS.

Perhaps people have some thoughts on the best way for content authors to publish test reports with the greatest meaning and broadest application?

Games-Foundry, Jun 30, 2012

#111
superpig

Drink more water! Unity Technologies

Joined:

Jan 16, 2011

Posts:

4,660

fholm said: ↑

For all intents and purposes value types (structs, primitives) are never put on the heap, yes they will effect the size of a reference type (class) if they exists inside of it as a field. There are exceptions to this, but they're rare and have to do with boxing and interfaces.
Click to expand...

I don't think it's so "rare" in Unity. What do you think happens to value-type local variables in coroutines?

The reason it's faster to recycle objects then to create new ones is not because of the constructor call, but rather that it lessens the pressure on the GC.
Click to expand...

It's not the constructor call, it's the allocator call (i.e. locating the empty chunk of memory to use for the object). It's true that the performance benefit in general comes from reducing GC pressure - it eliminates the framerate spikes, which are a much bigger problem in general - but when doing a like-for-like comparison for object creation, the GC's not relevant. It's not involved in allocating objects, only in deallocating them.

Local variables are faster then members on reference types because they are allocated on the stack instead of the heap.
Click to expand...

Accessing stack variables is not necessarily going to be faster than the heap (though it usually will be as it'll be in cache). For example, accessing a stack variable from 10 frames up after doing a lot of other unrelated work will probably not be faster than accessing a heap value that you just accessed a moment ago.

The reason a "for ..." is faster then a "foreach ..." is because foreach uses the built-in IEnumerable interface in .NET/Mono which causes a couple of extra method calls. However, using "foreach ..." on an array should result in the compiler inserting a "for ..." loop instead (not 100% sure on if this is true on Mono).
Click to expand...

Yes. Mono certainly does this some of the time, but not always. I've not yet been able to figure out when it does and when it doesn't.

Passing reference types into methods instead of large structs in general do not effect performance as whatever tiny gain (which I doubt is even measurable) you would gain from allocating less stack space for the method call is offset by the cost of the GC when it has to clean up the reference object.
Click to expand...

Er, given that structs are value types and thus passed by value, what about the time impact of creating the copy of the struct in the new stack frame? (It doesn't make a whole lot of sense to compare these scenarios as the semantics are different, to be fair).

The complaints on some of the Unity methods allocating memory are generally invalid, as most of the time it's because they return an array of objects, and the array needs to be allocated on the heap.
Click to expand...

They're not invalid. A better solution would be if there were versions of those methods that let us pass in an array to populate. That way we could allocate a maximum-length array once at load time, and then reuse it repeatedly without any further allocations. This is particularly crucial for things that will return large arrays (such as particles - which is exactly why Shuriken's GetParticles method takes an array to populate, as compared to Legacy's particle accessor that allocates a whole new array on every get).

superpig, Jun 30, 2012

#112
hippocoder

Digital Ape

Joined:

Apr 11, 2010

Posts:

29,723

Btw take .net results with a grain of salt because mono is not .net, it's it's own port of .net and probably not as optimised...

In answer to the question posed to me above, yeah we cached everything and got a 5fps boost, enough to reach 60 for the platform game. I think that makes it worthwhile. It's not just the garbage you worry about but the expense of repeated component calls.

I'm happy unity caches transforms their end - because sometimes you can't know in advance, short of using dictionaries (which ends up with messy code).

And since caching stuff is hardly difficult (a nice class, a nice function to stuff it with) why wouldn't you?

Last edited: Jun 30, 2012

hippocoder, Jun 30, 2012

#113
tatoforever

Joined:

Apr 16, 2009

Posts:

4,369

hippocoder said: ↑

Btw take .net results with a grain of salt because mono is not .net, it's it's own port of .net and probably not as optimised...

In answer to the question posed to me above, yeah we cached everything and got a 5fps boost, enough to reach 60 for the platform game. I think that makes it worthwhile. It's not just the garbage you worry about but the expense of repeated component calls.

I'm happy unity caches transforms their end - because sometimes you can't know in advance, short of using dictionaries (which ends up with messy code).

And since caching stuff is hardly difficult (a nice class, a nice function to stuff it with) why wouldn't you?
Click to expand...

In Windows the default framework for mono is MS .net. But still, you can use ms or mono .net in windows. XD

Last edited: Jun 30, 2012

tatoforever, Jun 30, 2012

#114
Eric5h5

Volunteer Moderator Moderator

Joined:

Jul 19, 2006

Posts:

32,401

tatoforever said: ↑

In Windows the default framework for mono is MS .net.
Click to expand...

No. It doesn't work that way. Unity is all Mono, on all platforms, and makes no use of .NET. As Hippocoder said, Mono is not .NET (it's an open-source clean-room reimplementation of .NET), and while it's similar, it doesn't work the same way.

--Eric

Eric5h5, Jun 30, 2012

#115
tatoforever

Joined:

Apr 16, 2009

Posts:

4,369

Eric5h5 said: ↑

No. It doesn't work that way. Unity is all Mono, on all platforms, and makes no use of .NET. As Hippocoder said, Mono is not .NET (it's an open-source clean-room reimplementation of .NET), and while it's similar, it doesn't work the same way.

--Eric
Click to expand...

Well i dint mean that, i mean that in windows you can use MonoDevelop* with whatever.net framework you have (mono or ms .net). Sorry for the confusion.

Last edited: Jun 30, 2012

tatoforever, Jun 30, 2012

#116
Eric5h5

Volunteer Moderator Moderator

Joined:

Jul 19, 2006

Posts:

32,401

tatoforever said: ↑

Well i dint mean that, i mean that in windows you can use MonoDevelop* with whatever.net framework you have (mono or ms .net). Sorry for the confusion.
Click to expand...

Not that it makes any difference, since Unity still only uses Mono....

--Eric

Eric5h5, Jun 30, 2012

#117
recon

Joined:

Nov 28, 2009

Posts:

119

Superpig said: ↑

They're not invalid. A better solution would be if there were versions of those methods that let us pass in an array to populate. That way we could allocate a maximum-length array once at load time, and then reuse it repeatedly without any further allocations.
Click to expand...

I couldn't agree more.
There are too many cases where the developer have no other choice because there are no function overloads that takes pre-allocated arrays:
Physics.RaycastAll, Physics.OverlapSphere, Physics.CapsuleCastAll, Physics.SphereCastAll, GameObject.GetComponents
is just some that I could think of.

recon, Jul 1, 2012

#118
mindlube

Joined:

Oct 3, 2008

Posts:

993

Maybe this horse is beaten already, but I always cache component references b/c the manual says it's a good idea:
http://docs.unity3d.com/Documentation/ScriptReference/index.Performance_Optimization.html

One thing I didn't see in my skim of this thread, is what about foreach() causing lots of memory allocations. I cannot bring myself to stop using foreach it's so expressive. But thats just a Mono thing i guess.

mindlube, Jul 3, 2012

#119
Games-Foundry

Joined:

May 19, 2011

Posts:

632

mindlube said: ↑

One thing I didn't see in my skim of this thread, is what about foreach() causing lots of memory allocations. I cannot bring myself to stop using foreach it's so expressive. But thats just a Mono thing i guess.
Click to expand...

Yup, it's covered in here somewhere.

Games-Foundry, Jul 3, 2012

#120
alexzzzz

Joined:

Nov 20, 2010

Posts:

1,447

fholm said: ↑

[*]The reason a "for ..." is faster then a "foreach ..." is because foreach uses the built-in IEnumerable interface in .NET/Mono which causes a couple of extra method calls. However, using "foreach ..." on an array should result in the compiler inserting a "for ..." loop instead (not 100% sure on if this is true on Mono).
Click to expand...

I've compared the resulting IL code and the outputs are almost identical.

superpig said: ↑

It's not the constructor call, it's the allocator call (i.e. locating the empty chunk of memory to use for the object).
Click to expand...

There's no need to locate anything. The runtime knows exactly where the unfragmented piece of free memory begins. Allocations are almost free unless they awake the GC.

superpig said: ↑

Accessing stack variables is not necessarily going to be faster than the heap (though it usually will be as it'll be in cache). For example, accessing a stack variable from 10 frames up after doing a lot of other unrelated work...
Click to expand...

... is just impossible. You can access a stack variable only inside the function it has been allocated in.

Last edited: Jul 4, 2012

alexzzzz, Jul 3, 2012

#121
Games-Foundry

Joined:

May 19, 2011

Posts:

632
Just discovered that Int32.ToString() causes heap allocation. Given we regularly update multiple UI labels over 20+ frames by lerping values, this is something I wanted to remove. Here's a little class that uses a lookup table.

Code (csharp):

using UnityEngine;

public class StringExtensions

{

private static string[] lookupInts;

private static bool mIsInitialized;

public static void Initialize ()

{

lookupInts = new string[9999];

for ( int i=0; i<9999; i++ )

{

lookupInts[i] = i.ToString ();

}

mIsInitialized = true;

}

public static string ToStringLookup ( int val )

{

if ( !mIsInitialized ) Initialize ();

if ( val >= 0 val < 9999 )

{

return lookupInts[val];

}

else

{

return val.ToString ();

}

}

}
Last edited: Nov 16, 2012

Games-Foundry, Nov 16, 2012

#122
alexzzzz

Joined:

Nov 20, 2010

Posts:

1,447
Code (csharp):

lookupInts = new string[9999];

for (int i = 0; i < 9999; i++)

{

lookupInts[i] = i.ToString();

}

You've just increased garbage collection time by about 100 extra microseconds.
alexzzzz, Nov 16, 2012

#123
Games-Foundry

Joined:

May 19, 2011

Posts:

632
alexzzzz said: ↑

Code (csharp):

lookupInts = new string[9999];

for (int i = 0; i < 9999; i++)

{

lookupInts[i] = i.ToString();

}

You've just increased garbage collection time by about 100 extra microseconds.
Click to expand...

I presume you mean the call to ToString() which indeed causes allocation. We run Initialize() once at the start when we initialize all our Managers, after which we do a single forced gc call ( not shown ) to clean that and other launch garbage up before play begins. Sure, this causes allocation on setup, but the benefit is there is no per-frame allocation once play begins.
Games-Foundry, Nov 16, 2012

#124
alexzzzz

Joined:

Nov 20, 2010

Posts:

1,447

I mean, the benefit is obvious, no per-frame allocations, but also there is a downside - from now every garbage collection will take a bit longer, because there are 10000 more references the GC have to process. Every time you decide to cache something, you should check whether the benefit of caching outweighs the cost of blowing the heap.

alexzzzz, Nov 16, 2012

#125
Games-Foundry

Joined:

May 19, 2011

Posts:

632

True, but given the current state of Mono 2.6 gc, even a single call to GC during gameplay has to be avoided like the plague because it's so slow. Were we to get the faster GC in Mono 2.8+, then it would be hoped we could actually allow concurrent GC and there would no longer be a need for such a workaround. Currently the benefit outweighs the potential downside ( in our case anyway ).

It's great to debate these matters btw. Thanks for joining in the discussion.

Last edited: Nov 17, 2012

Games-Foundry, Nov 17, 2012

#126
alexzzzz

Joined:

Nov 20, 2010

Posts:

1,447

By the way, how long does an average garbage collection actually take in your case?

alexzzzz, Nov 17, 2012

#127
Games-Foundry

Joined:

May 19, 2011

Posts:

632

alexzzzz said: ↑

By the way, how long does an average garbage collection actually take in your case?
Click to expand...

Ignoring the CPU dependency, it's 177ms in the editor. The million pathfinding nodes in the A*Pathfinding grid graph we have might be a big factor in that ( large map requiring fine granularity, plus we can't use navmesh as we have yet to find a solution that will allow us to recalculate areas at runtime when a player places buildings ). To my knowledge, only grid graph allows us to raycast with physics to refresh the graph under a building.

Last edited: Nov 17, 2012

Games-Foundry, Nov 17, 2012

#128
superpig

Drink more water! Unity Technologies

Joined:

Jan 16, 2011

Posts:

4,660

Games Foundry said: ↑

The million pathfinding nodes in the A*Pathfinding grid graph we have might be a big factor in that
Click to expand...

Can't you reuse the nodes between pathing queries, so they don't get GCed?

superpig, Nov 17, 2012

#129
Games-Foundry

Joined:

May 19, 2011

Posts:

632

superpig said: ↑

Can't you reuse the nodes between pathing queries, so they don't get GCed?
Click to expand...

Aron Granberg kindly addressed that in A*Pathfinding Project after we requested it: paths do get recycled. What probably causes gc to be slow for us is the million node references that form the actual grid graph that would be checked each gc cycle. If only there was a way to mark data to inform the gc to skip it during reference checking.

Actually, since last we spoke, it looks like Aron's added quite a bit, so I'll have a chat with with next time he's on Skype. http://www.arongranberg.com/astar/docs/graph-updates.php. That said, still looks like Navmesh updates are problematic.

Last edited: Nov 17, 2012

Games-Foundry, Nov 17, 2012

#130
techmage

Joined:

Oct 31, 2009

Posts:

2,133
Ok I did some tests here.

Is iterating through an array of ushorts faster than iterating through an array of ushorts?

Code (csharp):

ushort[] testArray = new ushort[65535];

var sw = System.Diagnostics.Stopwatch.StartNew();

for( ushort i = 0; i < testArray.Length; i++)

{

testArray[i] = 65535;

}

for( ushort i = 0; i < testArray.Length; i++)

{

testArray[i] = 0;

}

for( ushort i = 0; i < testArray.Length; i++)

{

testArray[i] = 65535;

}

for( ushort i = 0; i < testArray.Length; i++)

{

testArray[i] = 0;

}

for( ushort i = 0; i < testArray.Length; i++)

{

testArray[i] = 65535;

}

for( ushort i = 0; i < testArray.Length; i++)

{

testArray[i] = 0;

}

sw.Stop();

print(sw.Elapsed + "ms");

00:00:00.0026730
00:00:00.0027073
00:00:00.0028338

Code (csharp):

int[] testArray = new int[65535];

var sw = System.Diagnostics.Stopwatch.StartNew();

for( int i = 0; i < testArray.Length; i++)

{

testArray[i] = 65535;

}

for( int i = 0; i < testArray.Length; i++)

{

testArray[i] = 0;

}

for( int i = 0; i < testArray.Length; i++)

{

testArray[i] = 65535;

}

for( int i = 0; i < testArray.Length; i++)

{

testArray[i] = 0;

}

for( int i = 0; i < testArray.Length; i++)

{

testArray[i] = 65535;

}

for( int i = 0; i < testArray.Length; i++)

{

testArray[i] = 0;

}

sw.Stop();

print(sw.Elapsed + "ms");

00:00:00.0019804
00:00:00.0019543
00:00:00.0019990

So it appears like iterating through a ushort array is 25% slower than iterating through an int array.

That seems wrong? Does anyone know why that would be?

Lets see whats it like declaring the index and limit outside the loop:

Code (csharp):

ushort[] testArray = new ushort[65535];

ushort o;

ushort limit = 65535;

var sw = System.Diagnostics.Stopwatch.StartNew();

for( o = 0; o < limit; o++)

{

testArray[o] = 65535;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 0;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 65535;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 0;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 65535;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 0;

}

sw.Stop();

print(sw.Elapsed + "ms");

00:00:00.0027841
00:00:00.0027710
00:00:00.0028356

and interestingly, when compared to the ushort iterator and limiter being declared in the for loop above, this doesn't show much speed improvement, as compared to the int.

Code (csharp):

int[] testArray = new int[65535];

int o;

int limit = 65535;

var sw = System.Diagnostics.Stopwatch.StartNew();

for( o = 0; o < limit; o++)

{

testArray[o] = 65535;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 0;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 65535;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 0;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 65535;

}

for( o = 0; o < limit; o++)

{

testArray[o] = 0;

}

sw.Stop();

print(sw.Elapsed);

00:00:00.0018625
00:00:00.0018743
00:00:00.0018905

Iterating through ints is about 25% faster than iterating through ushorts.

Is setting an int equal to a larger or smaller number make a difference?

Code (csharp):

int[] testArray = new int[500000];

int o;

int limit = testArray.Length;

var sw = System.Diagnostics.Stopwatch.StartNew();

for( o = 0; o < limit; o++)

{

testArray[o] = 0;

}

sw.Stop();

print(sw.Elapsed);

00:00:00.0024679
00:00:00.0024435
00:00:00.0023730

Code (csharp):

int[] testArray = new int[500000];

int o;

int limit = testArray.Length;

var sw = System.Diagnostics.Stopwatch.StartNew();

for( o = 0; o < limit; o++)

{

testArray[o] = 5000000;

}

sw.Stop();

print(sw.Elapsed);

00:00:00.0023987
00:00:00.0025090
00:00:00.0023672

No difference

Is doing a > comparison on an int or a true false check on a bool faster?

Code (csharp):

int[] testArray = new int[500000];

int o;

int limit = testArray.Length;

var sw = System.Diagnostics.Stopwatch.StartNew();

for( o = 0; o < limit; o++)

{

if(testArray[o] > 300)

testArray[o] = 5000000;

}

sw.Stop();

print(sw.Elapsed);

00:00:00.0025280
00:00:00.0025867
00:00:00.0025122

Code (csharp):

int[] testArray = new int[500000];

bool[] testBoolArray = new bool[500000];

int o;

int limit = testArray.Length;

var sw = System.Diagnostics.Stopwatch.StartNew();

for( o = 0; o < limit; o++)

{

if(testBoolArray[o])

testArray[o] = 5000000;

}

sw.Stop();

print(sw.Elapsed);

00:00:00.0024250
00:00:00.0024105
00:00:00.0024439

Doing bool comparison is marginally faster.

What about by comparison?

Code (csharp):

int[] testArray = new int[500000];

byte[] testByteArray = new byte[500000];

int o;

int limit = testArray.Length;

var sw = System.Diagnostics.Stopwatch.StartNew();

for( o = 0; o < limit; o++)

{

if(testByteArray[o] > 30)

testArray[o] = 5000000;

}

sw.Stop();

print(sw.Elapsed);

00:00:00.0025063
00:00:00.0025392
00:00:00.0025081

Byte comparison is somewhere between bool and int comparison.
Last edited: Nov 19, 2012

techmage, Nov 19, 2012

#131
Games-Foundry

Joined:

May 19, 2011

Posts:

632

Possible explanation?

Using a smaller size int than the machine size may not be faster, in fact it may be slower as a 32 bit CPU retrieves data in 32 bit size chunks.

http://cplus.about.com/od/learnc/ss/variables_2.htm

Games-Foundry, Nov 19, 2012

#132
lilymontoute

Joined:

Feb 8, 2011

Posts:

1,181

Games Foundry said: ↑

Possible explanation?

Using a smaller size int than the machine size may not be faster, in fact it may be slower as a 32 bit CPU retrieves data in 32 bit size chunks.

http://cplus.about.com/od/learnc/ss/variables_2.htm
Click to expand...

Generally (and I state very generally as I've only tested this in one use case for a specific situation), I've found that you want to quantize things as either 8-bit or 32-bit. I converted some code from looking up 32-bit values to 16-bit values and actually got a significant slowdown. Byte arrays gave me a large performance increase, though - I'm guessing from greatly reduced cache misses. This is on a 64-bit processor architecture (32 bit program, though).

lilymontoute, Nov 19, 2012

#133
npsf3000

Joined:

Sep 19, 2010

Posts:

3,830

techmage said: ↑

Ok I did some tests here.

Is iterating through an array of ushorts faster than iterating through an array of ushorts?
Click to expand...

And every single one of your ushort tests are broken. Do not iterate with a ushort, but with an int otherwise you're mixing two separate tests together [and therefor will have garbage output]. Furthermore you're assigning int's to the ushort array - which could be impacting the test.

Last edited: Nov 19, 2012

npsf3000, Nov 19, 2012

#134
alexzzzz

Joined:

Nov 20, 2010

Posts:

1,447

techmage said: ↑

Lets see whats it like declaring the index and limit outside the loop:
Click to expand...

It should give you a slowdown on local and private arrays (OOPS: just on local arrays and arrays passed as a parameter), because it forces the jit-compiler to insert boundary checks.

PS
However, I'm not sure the Mono's jit-compiler is able to remove boundary checks like the Microsoft's one does.

Last edited: Nov 19, 2012

alexzzzz, Nov 19, 2012

#135
techmage

Joined:

Oct 31, 2009

Posts:

2,133

Thinksquirrel said: ↑

Generally (and I state very generally as I've only tested this in one use case for a specific situation), I've found that you want to quantize things as either 8-bit or 32-bit. I converted some code from looking up 32-bit values to 16-bit values and actually got a significant slowdown. Byte arrays gave me a large performance increase, though - I'm guessing from greatly reduced cache misses. This is on a 64-bit processor architecture (32 bit program, though).
Click to expand...

This is good to know... Should do some more tests on this.

I was totally thinking I was optimizing things by using shorts where possible.

techmage, Nov 19, 2012

#136
techmage

Joined:

Oct 31, 2009

Posts:

2,133

npsf3000 said: ↑

And every single one of your ushort tests are broken. Do not iterate with a ushort, but with an int otherwise you're mixing two separate tests together [and therefor will have garbage output]. Furthermore you're assigning int's to the ushort array - which could be impacting the test.
Click to expand...

Your right, I'll try that when I can.

techmage, Nov 19, 2012

#137
Games-Foundry

Joined:

May 19, 2011

Posts:

632

BUG REPORT #502423: Terrain.CullAllTerrains API call causes C# allocation

Terrain.GetRenderer()
Terrain.GetTerrainRendererDontCreate()
Light.GetLights()

This is the first post regarding allocations from within the Unity API following this topic. This post includes 3 instances within one API call, as highlighted on the attached deep profile image. We have 25 terrains in the scene so the allocations will probably be higher than testing with just a single terrain.

Last edited: Nov 27, 2012

Games-Foundry, Nov 27, 2012

#138
Games-Foundry

Joined:

May 19, 2011

Posts:

632

BUG REPORT #502436: Component.GetComponent et al API calls cause C# allocation

Component.GetComponent()
Component.GetComponentInChildren()
Component.GetComponentsInChildren()
Component.GetComponents()

This is obvious given the nature of the calls ( new component or array instantiation ), and in many cases caching can remove per frame use of these functions. However, where that is not possible, superpig and I are suggesting the following overloads/enhancements:

Component.GetComponent ( myComponent )
Component.GetComponentInChildren ( myComponent )
Component.GetComponentsInChildren ( List<T> intoList )
Component.GetComponents ( List<T> intoList )

Developers can supply an existing component or array into the call to be populated. Where the results set is larger than the array size, the method returns the total number of results, and the full array. Where the results set is smaller than the array size, the method returns the size of the result set.

FEATURE REQUEST
Please vote for this feature request.

Last edited: Nov 27, 2012

Games-Foundry, Nov 27, 2012

#139
superpig

Drink more water! Unity Technologies

Joined:

Jan 16, 2011

Posts:

4,660

Fwiw upon further consideration I think using List<T> might be better than arrays - it's backed by arrays internally, ensures you can never access "leftover" slots in the array, and eliminates problems with there being more components than you had room for. Capacity management is less obvious, but its easy to learn how to do that with a couple of example snippets.

superpig, Nov 27, 2012

#140
Games-Foundry

Joined:

May 19, 2011

Posts:

632

Let me make sure I understand correctly, let's say you pass a List<T>[10] in, and there's 12 results. The API would instantiate and add the extra 2, changing the list capacity to 12. While it's allocation it doesn't matter, because it's our recyclable List<T> that we always re-use, but with a larger capacity.

What about the case where there are less results than the capacity of the list? While the count will reflect the result size, and the capacity left unchanged, won't the contents previously in the unused slots now marked as null be garbage collected?

Last edited: Nov 27, 2012

Games-Foundry, Nov 27, 2012

#141
recon

Joined:

Nov 28, 2009

Posts:

119
I'm at work so I can't submit a bug report right now, but I can confirm that

Code (csharp):

Physics.RaycastAll()

Physics.OverlapSphere()

Physics.SphereCastAll()

Physics.CapsuleCastAll()

all produces garbage, as they return arrays.
recon, Nov 27, 2012

#142
superpig

Drink more water! Unity Technologies

Joined:

Jan 16, 2011

Posts:

4,660

Games Foundry said: ↑

Let me make sure I understand correctly, let's say you pass a List<T>[10] in, and there's 12 results. The API would instantiate and add the extra 2, changing the list capacity to 12. While it's allocation it doesn't matter, because it's our recyclable List<T> that we always re-use, but with a larger capacity.
Click to expand...

Yeah. It's an allocation, but it's one-time, and we can avoid it by correctly sizing our list to 12 in the first place. If we really wanted to go all out on this, there should be an extra 'expandCapacity' flag that indicates whether the list should be expanded to 12 or whether Unity should just discard 2 of the results.

What about the case where there are less results than the capacity of the list? While the count will reflect the result size, and the capacity left unchanged, won't the contents previously in the unused slots now marked as null be garbage collected?
Click to expand...

Not in this case - remember, the unused slots contain component references. The components they reference will be alive as long as the GameObjects they're attached to are alive, and the references themselves are value types and so won't get GCed (think of them like C++ pointers). This is why calling GetComponent() to get a single component doesn't allocate anything.

superpig, Nov 27, 2012

#143
alexzzzz

Joined:

Nov 20, 2010

Posts:

1,447

I prefer to vote for generational garbage collector: http://feedback.unity3d.com/unity/scripting/1/top/active/scripting-garbage-collection-sg
It's one stone that kills multiple birds.

alexzzzz, Nov 27, 2012

#144
superpig

Drink more water! Unity Technologies

Joined:

Jan 16, 2011

Posts:

4,660

SGen would definitely be better than the current GC (Boehm?) but it still won't be as good as simply not needing to run the GC at all.

superpig, Nov 27, 2012

#145
Arowx

Joined:

Nov 12, 2009

Posts:

8,194

OK What about a memory management system built into Unity?

Gamasutra article about building one here

If you have a bespoke build of Mono and the C++ Unity Game Engine code you could probably create a memory management system that is very game friendly!

Arowx, Nov 27, 2012

#146
superpig

Drink more water! Unity Technologies

Joined:

Jan 16, 2011

Posts:

4,660

Similar to the unavoidable terrain wastage (even when you're not using terrains) I'm seeing ~7KB allocated every frame by SendMouseEvents, even though I'm not using any of the OnMouse* events anywhere in my codebase. A flag to turn it off would be most welcome, though if one already exists I'd appreciate someone pointing me at it.

superpig, Nov 27, 2012

#147
Games-Foundry

Joined:

May 19, 2011

Posts:

632

BUG REPORT #502648: Physics Cast API calls cause C# Allocation

Physics.RaycastAll()
Physics.OverlapSphere()
Physics.SphereCastAll()
Physics.CapsuleCastAll()

Returning new arrays each call causes allocation. A solution similar to that proposed in bug #502436 would be perfect where we can pass in our own cached list.

Physics.RaycastAll ( ..., List<RaycastHit> myExistingList )
Physics.OverlapSphere ( ..., List<Collider> myExistingList )
Physics.SphereCastAll ( ..., List<RaycastHit> myExistingList )
Physics.CapsuleCastAll ( ..., List<RaycastHit> myExistingList )

Games-Foundry, Nov 28, 2012

#148
Games-Foundry

Joined:

May 19, 2011

Posts:

632

I'm going to use this post as a regularly updated list of possible things to investigate. If others confirm they are also seeing it, or I notice it a lot as I analyze our project, I'll raise a bug report.

API:
None outstanding, all submitted as bug reports.

Mono:
StartCoroutine()

Workarounds
Object.get_name() - added cachedName property
Component.get_tag() - will add cachedTag property or replace any lookups with existing global enum taxonomy

Last edited: Nov 28, 2012

Games-Foundry, Nov 28, 2012

#149
Games-Foundry

Joined:

May 19, 2011

Posts:

632

BUG REPORT #502675: SendMouseEvents.DoSendMouseEvents() causes C# Allocation

The following calls within DoSendMouseEvents() cause allocation:

Camera.get_targetTexture()
Camera.get_allCameras()
RaycastHit.get_rigidbody()
GUILayer.HitTest()

A request was included to provide a means to turn off mouse events.

Last edited: Nov 28, 2012

Games-Foundry, Nov 28, 2012

#150

(You must log in or sign up to reply here.)

Page 3 of 7