Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. We have updated the language to the Editor Terms based on feedback from our employees and community. Learn more.
    Dismiss Notice
  3. Join us on November 16th, 2023, between 1 pm and 9 pm CET for Ask the Experts Online on Discord and on Unity Discussions.
    Dismiss Notice

Performance Questions

Discussion in 'General Discussion' started by Martin_H, Nov 17, 2015.

  1. Martin_H

    Martin_H

    Joined:
    Jul 11, 2015
    Posts:
    4,433
    I'm still rather inexperienced with unity and gamedev but I did a bit of iOS game development a few years ago so I'm very conscious of how performance bottlenecks can dictate gamedesign decisions. Back then it was "If I can optimize this just a little bit more, I can have more units on screen and make the game more fun.".

    I'm in the process of slowly trying things out in unity and evaluating what I can and can not do in my game. I just stumbled over one thing that I don't quite understand.

    I was comparing a single particle system that generates ~500k vertices, according to the stats window, to 2500 particle systems that generate the same amount of vertices. The single system was about twice as fast. I was wondering how big the overhead of just "having a thing do nothing" is and modified my test setup to generate 10k empty gameobjects with this empty script attached:

    Code (csharp):
    1.  
    2. using UnityEngine;
    3. using System.Collections;
    4.  
    5. public class MUpdateTestCase1 : MonoBehaviour {
    6.  
    7.    // Use this for initialization
    8.    void Start () {
    9.    
    10.    }
    11.    
    12.    // Update is called once per frame
    13.    void Update () {
    14.    
    15.    }
    16. }
    17.  
    Running this empty update on 10k gameobjects seems to cost ~4 to 4.5ms per frame.


    I don't know why, but somehow I had anticipated such performance concerns to magically disappear when I switch from mobile to PC. On iOS there was a messaging overhead for using typical Objective-C functions that was significant enough that I rewrote many performance critical parts of my game in plain old C and that really helped.

    Well here I am trying things out in Unity with C# and I'm totally clueless what my options are when I face such problems. My impression is that I can't do lowlevel C stuff in unity, but I might be wrong. How different is performance of different aspects between the editor and a compiled windows build? What would be the best way to test performance in a build? Can I somehow activate the same stats window that I have in the editor?


    I do know the saying about premature optimization. But I keep coming up with ideas that seem to be pushing what my i7 cpu can handle, and that is in isolated thest cases without a real game with pathfinding etc. going on at the same time.
    A quick example: among the last things that I was playing around with was a flamethrower that shoots rigidbodies with particle systems attached. They should be able to set red barrels on fire and when they explode they should be able to start chain reactions and throw other objects around. But with all the physics and particle systems it already drops below 60fps.
    I'm pretty sure it could look awesome to approximate burning liquids by using rigid bodies with sphere colliders and particle systems, e.g. for a big exploding oil tank that crumbles to pieces and fire comes "flowing" out. But with the low performance I get on having many particle systems I know already that won't be possible.

    I'd appreciate hearing some thoughts on this very broad performance question.
     
  2. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Yep there's overhead for MonoBehaviour Update as it needs to access and load all those variables for processing, so it's doing something 10,000 times a frame, evidently that's worth 4ms. It gets significantly less with IL2CPP though from testing.

    Ideally for large collections you'd just have a class and update them as you see fit. Update is a little wasteful with overhead but not slow for typical usage numbers ie 100.

    A better question to ask is why would you use 10,000 Updates ? Do you need 10,000 monobehaviours or do you need 10,000 items in a list?

    If it's in a list I'd be surprised if it took more than 0.1ms to traverse...
     
    Ryiah, zombiegorilla, Kiwasi and 2 others like this.
  3. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    Yeah, basically don't use Update when you don't need it. That doesn't mean you should avoid it (and there have been some crazy approaches to do that suggested int he past), it just means to not hit things every tick if you don't actually need to.

    Also, as @hippocoder says, if you're dealing with large collections of things then try to deal with them as a set of data rather than a bunch of individual objects. Tight loops iterating on a contiguous set of data can be much faster than splitting the same computation up across thousands of objects. Mileage varies, of course. For stuff that's performance critical you should first test to see if it's actually going to matter (ie: do a test case with a straightforward implementation and measure real world performance), and then prototype different solutions to see which one works best (ie: more test cases and measurement of real world performance).
     
    Ryiah, zombiegorilla, Kiwasi and 2 others like this.
  4. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    What those guys said.

    And to state the obvious, blank Update functions should be deleted. There is premature optimisation, which is generally evil. Then there is not being dumb, which is generally good. ;)
     
    Trexug, Ryiah, Martin_H and 3 others like this.
  5. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    8,984
    Indeed, what the others said.

    Its a pretty general trade off (in games and other software development). Usually anything that built for ease of use or convenience, often has a performance trade-off past a certain scale. Monobehaviours are great and let you hook into a lot of things automatically, but they come at a cost. Usually that cost isn't noticeable in the normal course of most games, but it will add up. With very rare exception, we never use them at all. But that comes at a loss of convenience (and added complexity to do things that would normally be simple.)

    Its like days of flash, dropping movieclips on the stage was simple and fast, but after a certain point it was a bottle neck. You could get much better performance by blitting everything, but then you had to build your own methods for events, collision, etc. It is similar in Unity. Monobehaviors do a lot of ground work for you, but after a certain point the overhead will become a bottleneck.
     
    Ryiah, GarBenjamin, Martin_H and 2 others like this.
  6. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    ... and most people don't know what that means. ;) (To be clear, I'm not saying you're one of them.)

    Those words are often interpreted to mean "optimisation before you run into performance issues", where what it's really referring to is "optimisation before you understand how the thing works". Ideally you'll avoid performance issues by designing for, constantly measuring, and appropriately managing your application's performance throughout the lifetime of your project.
     
  7. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,327
    Actually it is "optimization before you ran the thing through profiler". Just because you understand how it works, doesn't mean you know where your bottleneck is.

    People can make an "educated guess", but their guess can turn out to be wrong. So, people shouldn't rely on guesses and use tools instead.
     
  8. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Penguin's coming from the point of view where profiler is an obvious step, so obvious, that it's not worth mentioning. That's optimisation technique not optimising before you have a full picture of what needs doing, ie understanding it.
     
  9. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    Running something through the profiler and knowing where a bottleneck is is only a part of understanding how a thing works. Also, the Profiler is super useful, but it's only one tool of many.

    I argue the opposite. Knowing where a bottleneck is doesn't imply knowledge of why it's a bottleneck, how it interacts with the rest of the system, or potential solutions. Good luck and obvious inefficiencies aside, you can't fix a bottleneck effectively without understanding the system it is a part of.
     
  10. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,327
    Look, he didn't state it, so I don't know what he meant and what his point of view was. Can't read minds over tcp/ip here.
    Since there are people who don't know what "premature optimization" is, then it is safe to assume that the step is not obvious.

    That's a good point and it is correct.

    However, it is worth mentioning out that you do not need complete understanding of the system, just portion of it that is related to the bottleneck.
     
    angrypenguin likes this.
  11. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    I generally translate it as "optimisation before you know what to optimise". I think it gets to essentially the same effect.

    I come from a slightly different background from folks like @zombiegorilla. So far in the tiny projects I've work on code optimisation is not that different from optimising a manufacturing plant. Figure out what is really happening. Figure out where you can eliminate the most waste for the least cost. Eliminate waste. Repeat. Its pretty much lean manufacturing 101.

    Figuring out what is really happening is a skill that takes a lot of time and effort to perfect. It takes understanding the system, science, and measurement results.

    I've found optimising from the top down always works better. Eliminating a step is often better then optimising it. And to do that you need a good high level understanding of the system. The higher level you can start from the better. Eliminating a customer requirement is more effective then jumping through hoops to meet it.
     
  12. Martin_H

    Martin_H

    Joined:
    Jul 11, 2015
    Posts:
    4,433
    Thanks a lot for all the replies! I'm happy to see this thread thriving.

    I merely stumbled about the 10k Update() calls trying to find out what makes the higher number of particle systems for the same amount of particles so inefficient. It is not something that I have a usecase for (yet).

    I've seen slides on exactly that topic a few days ago, might have even been linked here somewhere, I can't remember tbh. But it makes sense to me even though I don't understand all the details of how the cpu is accessing memory.

    Now that I tried it out it seems obvious but I wasn't aware that it actually makes a difference. I attribute this to me knowing neither C# nor the API and what's going on behind the curtains. I thought that implementing the Update() function in my monobehaviour derived class is like overriding the Update() that is implemented in the parent class, so I assumed that should not make a difference. Quite obviously I was wrong and now know how to avoid that in the future. The next question I had was if it is the Update() that is the problem or calling any function, or just having the monobehaviour on every object, or whatever.

    I cranked up the number of objects a little to make differences more pronounced.
    empty Update() on all objects = ~ 21 ms
    calling an empty DoNothing() manually from the parent object like this
    Code (csharp):
    1.  
    2.   for (int i = 0; i < count; i++ ){
    3.        transform.GetChild(i).GetComponent<MUpdateTestCase1>().DoNothing();
    4.      }
    costs ~14 ms

    And that already is confusing to me because it feels like this way I'm making more function calls because I have more code to do the same thing. But since I don't know how everything behind Update() works maybe there actually are fewer function calls in the background or whatever is slowing this down.
    When I do this instead:
    Code (csharp):
    1.  
    2.   for (int i = 0; i < count; i++ ){
    3.        DoNothing();
    4.      }
    I'm down to ~1 ms
    So just calling a function is cheap, which is a relief. I wasn't sure about that.

    I'm interested in what alternatives there are to Monobehaviours. I'm not saying I want to throw them out right away just because something else is faster, I know that wouldn't be reasonable. I just want to know what my options are. E.g. is it possible to subclass GameObject and give it a bunch of public variables so that I could more easily treat them as data to iterate over without having to retrieve all the monobehaviour components that encapsulate the data that I need an object to store? I did a test and it seems that GetComponent<>() is rather costly at about ~10ms in my test-setup.


    That's pretty much what I'm trying to do.

    I still wonder what gives me the most accurate data. Is editor + profiler the way to go or can there be significant differences when I just run the compiled build? So far I'm doing 99.9% only testing it in editor and wonder if that is problematic.

    In the example of my game this pretty much means deciding which feature I can afford and which feature I need to cut. CPU time is only one of the resources I need to manage. My time and ability are also very crucial factors. What I'm trying to do is figure out which things I need to cut before I implement them, so that I waste the least time possible.
     
  13. Martin_H

    Martin_H

    Joined:
    Jul 11, 2015
    Posts:
    4,433
    I just did a quick google on coroutines because that looked like a useful thing to avoid doing things every frame that don't need to be done every frame, but I didn't find anything that looked good. Is this a C# sharp thing so that I should leave out the "unity" keyword in my search? Also very related: what is the best practice to implement timers for large numbers of objects? Where could coroutines be problematic, e.g. when I try to implement a "paused" game state to show a menu without switching the scene.
     
  14. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,327
    The issue here is that this doesn't work with large projects. You won't have complete understanding of the system, ever. You'll have rough overall idea about architecture, and good idea about small portion of the system you're working with. Once you're done with current portion, you'll move on another part of the system, and knowledge about portion you worked with will start fading away from your memory.

    The problem with coroutines is that they and their state does not serialize by default. So, when you make a small change in a script that is running on live scene, you'll need to reboot the scene.
     
    Martin_H likes this.
  15. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    But that tests how slow GetChild is and how slow GetComponent is. Wouldn't you cache that ? DoNothing is the fastest thing there, and probably quite close to 1ms with cache.
     
    Martin_H, Kiwasi and zombiegorilla like this.
  16. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    8,984
    Basically building the framework yourself. We use a framework overlay with Unity, it is entity based and started as life as an AS framework, eventually became C++ for our internal engine and then converted to C# when we switched to Unity. It uses services extensively and highly optimized for performance. The trade off is that for core game development, the engineers have to be very familiar with the framework. There is no "scene" or dropping gameobjects on the stage, it is completely separated from all content and data. That also means that what would be relatively simple changes to a typical Unity style project, are much more involved and require a careful planning. (and a dedicated team to maintain the framework).

    But that is sort the extreme end of things (there are only a couple of developers that work in the style we do.) With a little planning, you can build subsystems to handle some core elements. But as others have said, optimizing at that level may fall under the premature optimization.

    This.

    Also, looking at the whole project and find places to cut/make optimizations that have a bigger bang for the buck. Sure Micro optimizing code may have a small impact, but making some creative choices in the UI (reusing textures, combining, simplifying, etc...), or atlasing particles or even reducing particle count/life can have a much higher impact on performance. Things like that.
     
    Martin_H, GarBenjamin, Ryiah and 2 others like this.
  17. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    In a well designed architecture the high level interactions between systems should be no more difficult to comprehend then each individual system. Do that on several layers and you can easily "zoom" in and out.

    Often the performance of an entire system will be defined by the speed of a single bottle neck. Sometimes the best approach is to zoom in on the bottle neck and optimise the hell out of it. But you can often get more bang for your buck by eliminating the bottle neck or scheduling around it.
     
  18. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,327
    Except that you have absolutely no reason to expect that the system you work with will be well designed.
    Any software product will carry weight of bolted-on new features, rushed deadlines, and hacky workarounds. Most of the stuff will look horrible under the hood.
     
    Martin_H and angrypenguin like this.
  19. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    They're a great tool for many cases. It really depends on the needs, though. Coroutines aren't the easiest thing to manage over time if all you want from them is a variable tick rate. When I use coroutines it's usually to implement a sequence of instructions over time.

    If I want something to be updated regularly, but don't want to do it every tick, it's going to be because there are lots of those things and they don't need to be updated each tick to achieve the desired result. (Why? If there aren't lots of them then it usually doesn't need optimisation. And if it's a single expensive thing then putting delays between executions doesn't solve the performance impact.) So, I might work that by having a ThingUpdateManager which itself uses Update, has a list of all of the ThingsThatNeedPeriodicUpdates, and calls something like ManagedUpdate() of a subset of them each tick.
     
    Martin_H and Kiwasi like this.
  20. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    Where possible I (temporarily) eliminate it as a part of the pre-optimisation tests. Measure the system's overall performance with and without it. That difference is the absolute best case outcome for optimising that piece, based on the incorrect but useful assumption that you might be able to find a solution that gets all of the work done with none of the effort. Clearly you won't find such a solution, but it's still a useful measurement because if that difference isn't big enough to meet your goals then you can stop wasting time thinking about optimising that portion of your system.
     
    Martin_H, zombiegorilla and Kiwasi like this.
  21. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    You do have a full .NET (Mono) environment at your disposal. MonoBehaviour can be treated more or less like an adapter to get from .NET code to objects in your scene, rather than as the primary thing that directly performs work. This lets you use more or less any architecture you want for the part of your code that does the work. I can't think of many cases where I'd recommend something like that, though, and in the ones I can think of it's about architecture rather than optimisation.
     
    Martin_H, zombiegorilla and Kiwasi like this.
  22. Martin_H

    Martin_H

    Joined:
    Jul 11, 2015
    Posts:
    4,433
    Thanks a lot for all your replies! It's all very interesting and quite a lot of input for me to compute. You've got me convinced that it would be more beneficial to build a testcase that is closer to what will actually happen in my game. So far all I've got to show is this:


    I'm still digging through things and trying to get it to run faster. From what I can tell it seems that this one isn't very fast:
    http://docs.unity3d.com/ScriptReference/Physics.OverlapSphere.html
    Should I generally avoid using (or writing) functions for frequently executed tasks that return an array, because of the memory allocations?

    That's good to know, thanks for the heads up! I didn't even know that it could reload scripts at runtime, I always rebooted the scene ^^.

    I've started to use InvokeRepeating() to give my entities a more spaced out update cycle for which I know the interval so that I can some things with the time passed between updates. Are there any known pitfalls using InvokeRepeating() for something like this?

    That's one of the things that never really were on my mind because I never used any of this and the whole API is new to me. Thanks for the reminder!


    I think I'll need to spend some more time thinking about the requirements that any architectural changes would likely have and then think about implementing a "manager" object that I can use to optimize iterating over objects etc.. I once wrote a collision manager that I probably could port over and adapt to what I need. Ideally I'd not need to touch the physics simulation, but for the system that I have going for emitting heat and setting things on fire, some spacial optimization like presorting objects into grid cells could work, so that I only compare those objects that have a chance of affecting each other.

    This thread seemed relevant, I'll leave the link here so that I can take another look at it later:
    http://forum.unity3d.com/threads/inner-workings-of-a-function-physics-overlapsphere.174601/
     
  23. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Looks pretty clear that the 3-4k setpass calls are a cpu and gpu nightmare. You want to be optimising the hell out of your rendering first.

    Put it this way, I sweat when I have more than 100-200 setpass. This needs to be as low as possible. I don't think anything you're doing is being batched. It looks like somehow, you're spawning a unique material on every single moving thing. Are you editing material properties at runtime? if so, that will be why. I can't really figure out from that screenshot anything else that would raise setpass so high.

    Basically by using as few unique materials possible and don't change many at runtime, you'll dramatically improve the potential for batching and reduce setpass. This in turn will give you a lot more CPU time back, and allow the GPU more time to complete.

    But instead of "I've heard so and so is slow" how about "here is a picture of my profiler".
     
  24. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    Well, InvokeRepeating does what it says on the label... no more and no less. It's no more managed than doing the same thing with Coroutines (and I expect that internally they're quite similar).

    For simple cases it will be fine. For more complex cases there's the possibility that it will make things worse, because there's a whole bunch of ways that your calls could coalesce into one big, repeating spike or something like that and there's no management taking place. (For instance, if every enemy has an InvokeRepeating on a 1 second timer for an expensive AI process, and you spawn 100 at the same time, then with a simple implementation your 100 expensive calls will all happen on the same tick, causing a spike every second. That's the opposite of what you want!)

    Using a manager for that kind of thing can be extremely helpful, because it can have some bigger picture information and use it to help determine the best way to use its budget of processor time. For instance, if you ask it to add 100 new tickable items it would be fairly easy to make that smart enough to spread them out over available time. It could also re-balance as tickables things are added and removed, or prioritise the update order based on, say, distance from the player (maybe nearby stuff should be updated more often, and distant stuff can be updated less often).
     
    Martin_H likes this.
  25. Martin_H

    Martin_H

    Joined:
    Jul 11, 2015
    Posts:
    4,433
    Yes, I have a script on the containers that randomizes their material (as per suggestion in another thread), but I'm aware that this is wasteful. You're right, in a performance testcase this has no business being there, I forgot to deactivate it. It does not make a huge difference though. In direct comparison that costs less than 1ms per frame. Most of the draw calls must come from the particle effects.

    A reasonable request! I just didn't want to waste anybodies time by having you guys dig through my profiler screenshots while there is still stuff that I could solve on my own. E.g. there's no object pooling at all and every barrel explosion instantiates 6 prefabs.
    In the profiler I have seen spikes of up to ~1MB under "GC Alloc" and it looked to me like it's coming from Physics.OverlapSphere() - hence my general question if this is a general problem with functions returning arrays.

    I doubt I can get below 200 drawcalls, but I'll certainly try to get the number down, thanks for the advice. I was never really sure if "SetPass" means "drawcall" to be honest.


    Call me crazy, but I just put a random value in the parameter that determines when it is first executed :D.
     
  26. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Batches means drawcalls, SetPass means setting up a bunch of changes between draw calls.

    Like the first draw call will be slow as it includes setpass
    but any extra draw calls after that without a change of material will be fast! much faster than the initial draw call.

    This is how I was able to guess you were causing Unity to create unique materials. When you modify a material, it creates an internal copy, and hence will need to setpass that.
     
    Martin_H likes this.
  27. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    That's a great first step and it might be all you need in many cases.
     
    Martin_H likes this.
  28. Martin_H

    Martin_H

    Joined:
    Jul 11, 2015
    Posts:
    4,433
    From the Docs:
    http://docs.unity3d.com/Manual/DrawCallBatching.html
    The red barrel mesh has 338 vertices -_-. I guess that broke the batching.
    Reading a bit further...

    =(. Turning those off is not really an option for me. I guess no dynamic batching for me then...
     
  29. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    That's ok, dynamic batching is not the only kind of batching.
     
    zombiegorilla likes this.
  30. Brainswitch

    Brainswitch

    Joined:
    Apr 24, 2013
    Posts:
    270
    You can roll your own, my custom particle system does it's own batching allowing me to draw a quarter of a million particles with 16 batches.
     
    Martin_H and zombiegorilla like this.
  31. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    Absolutely. I was thinking more of static batching when I wrote that, which I probably should have been less vague about.
     
    Martin_H likes this.
  32. Brainswitch

    Brainswitch

    Joined:
    Apr 24, 2013
    Posts:
    270
    As others have said, the more objects the more overhead (between Unity's core and the scripting language etc), and empty MonoBehaviour (special Unity event-)methods are used and called by Unity.

    I hope that after reading this thread you've realized that it is possible. Curious to know if you remember what was the bottleneck?

    When I wrote this, those 16 batches was also each a SetPass call, and writing it made me realize how I could optimize that. So now those 16 batches uses only 1 SetPass call :)

    The Physics class has a bunch of methods returning arrays, and yes, they do cause memory allocations. Physics2D has non-allocating versions where you provide your own array, and I believe this will be implemented for Physics as well (in Unity 5.3 if I recall correctly reading the Release Notes).
    If OverlapSphere is slow? I wouldn't say so, how many times do you need to call it and can you cache the result?
     
    Martin_H likes this.
  33. Martin_H

    Martin_H

    Joined:
    Jul 11, 2015
    Posts:
    4,433
    Here is a quick update. Thanks to your awesome help I've been able to significantly improve the performance of the fire system:



    Explosions are currently deactivated. That is the next aspect that I'll work on.


    I want the levels to be generated at runtime. I'm not sure if static batching still is an option, I'll have to read up on that. Maybe I can combine meshes dynamically in code to make it work. But that'll be further down the line. I need to have the level generation up first.

    I think having large numbers of rigid bodies in a pile would be the biggest problem. But I'll give it another try since it seems I'm slowly getting somewhere with my optimizations :).
     
  34. 3agle

    3agle

    Joined:
    Jul 9, 2012
    Posts:
    508
    Everyone has already covered any useful info I would contribute here, but I just wanted to say that I think this is the most useful optimisation thread that has appeared on this forum in a hell of a long time, so good job, I hope it continues, no doubt this will help lots of people.
     
    angrypenguin and Martin_H like this.
  35. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    You can merge meshes at runtime. I don't advocate dynamic batching for anything except quads in a mobile sprite game, or 2D game. I don't think it really helps.

    Plus you will have incorrectly counted the verts. How many verts you REALLY have internally depends on UV islands, smoothing groups, and vertex properties you might have, so it's not really any good except for Unity's 2D system :D plus it burns cpu just preparing dynamic batches. A better idea is getting your setpass down as low as possible.
     
    Martin_H likes this.
  36. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    I'll tell you why, bluntly? because Martin_H comes across as an OK guy, he isn't mouthy, he doesn't troll people (I'm sure he does but its secret) and so on.

    That's right, nice guys get cookies :D
     
  37. 3agle

    3agle

    Joined:
    Jul 9, 2012
    Posts:
    508
    Yes that could be part of it, it's just nice to see a good collection of solid advice, none of which is misleading or just tangential to the topic. It's much more informative for a general reader than the usual 'I've found a cool optimisation' thread. Anyway, don't want to interrupt the good advice! Back to work for me.
     
    angrypenguin and Martin_H like this.
  38. darkhog

    darkhog

    Joined:
    Dec 4, 2012
    Posts:
    2,218
    Perhaps Unity's compiler should check for blank Update() in mono behaviours and delete it from source when compiling (while keeping in "actual source", i.e. the one you edit)? Such optimization would be great to have. It could even only happen when actually building the project as performance in editor isn't as important as in final product (unless it's REALLY terrible, like 3fps).
     
    Kiwasi likes this.
  39. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Probably not worth it, it barely impacts anything under IL2CPP. And once IL2CPP is mature, they'll be able to do these optimisations as part of that step, which is one of the plans with IL2CPP (to optimise code with knowledge of the Unity engine).

    Far better would be Unity actually making enlighten usable, and making things faster which are actually slow :D
     
    Ryiah, Kiwasi and landon912 like this.
  40. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,019
    Amen!
     
  41. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    8,984
    Or just don't add empty methods.
     
    angrypenguin and Kiwasi like this.
  42. superpig

    superpig

    Drink more water! Unity Technologies

    Joined:
    Jan 16, 2011
    Posts:
    4,614
    338 vertices for something that, based on your screenshots, covers barely 70 pixels on screen? That does not seem right.
     
    angrypenguin and Kiwasi like this.
  43. darkhog

    darkhog

    Joined:
    Dec 4, 2012
    Posts:
    2,218
    What if you need to use C# features that aren't achievable with AOT solution like IL2CPP (reflection, etc.)?

    Except default MonoBehaviour template contains empty start() (not big of a deal) and empty Update(). In 8 cases out of 10 you won't know if you actually need update there or not until you actually write and test code and when you finish that part... you may forget to remove Update(). And then wonder why it is running so slow.
     
  44. darkhog

    darkhog

    Joined:
    Dec 4, 2012
    Posts:
    2,218
    Future-proofing. You see, some time ago best smartphones had 320x240 screens, now we have 4k screens and not before long smartphones will have 8k and 16k screens. So in future it won't take just 70 pixels.
     
    Kiwasi likes this.
  45. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,516
    Just delete it. If you need it, spend the three seconds it takes to put it back.

    Also, with practice in designing things rather than just hacking them together on the fly you'll have a pretty good idea if you should need it or not.
     
    Kiwasi and zombiegorilla like this.
  46. Kiwasi

    Kiwasi

    Joined:
    Dec 5, 2013
    Posts:
    16,860
    That's a total waste of 15 keystrokes. I don't have that kind of time. :)
     
  47. AcidArrow

    AcidArrow

    Joined:
    May 20, 2010
    Posts:
    11,019
    I see it now, the next asset store killer asset:

    "Empty Monobehaviour remover"
     
    Kiwasi likes this.
  48. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    8,984
    Edit your templates. Heck, I built base script generator long ago. I hate having to setup all the references, namespaces and all that stuff by hand every time. I'm old school lazy when it comes to coding. If have to do anything more than 3 times, I write a tool/script to do it. ;)

    While empty updates are not clean, and are less performant than not having them, they won't ever be the cause of anything running 'slow'. Slower, if profiling, by numbers, sure. But not actually impacting performance by themselves.
     
    Martin_H likes this.
  49. Martin_H

    Martin_H

    Joined:
    Jul 11, 2015
    Posts:
    4,433
    Thanks man! That truly made my day =).

    These are all just placeholder assets that I created when I still had no idea how big they'd be on screen later on. My thoughts were that for testing I'd rather be a bit wasteful and have some easily optimizable things left as "buffer" instead of testing best-case-scenarios with simple cubes only. Another reason is what @darkhog hints at. Ideally I can let some assets in a level of detail that also looks good without the pixelation effect, in cases where that does not mean more work to create the asset. I'd like to some day get the asset quality up and lift the veil of the pixelation filter to bump it up a step on the visual quality ladder. Maybe for a possible sequel if I ever get to release a part 1. But you are totally right that even without the pixelation I don't need that many vertices. It's on the to do list :).



    I have found another useful thread here:
    http://forum.unity3d.com/threads/ga...third-party-assets-in-the-asset-store.140673/
    And a video on memory debugging is here:


    I have spent some time eliminating runtime allocations and it has gotten me quite a bit further towards my goal. I also made some changes to how I keep track of my entities and which one to compare to which other one in certain situations. There could potentially still be a lot of performance gained from some form of spatial optimization but I'll save that for later. I still need to do the object pooling more properly. Right now it improves performance but I think it leaks memory by not ever destroying objects and not reusing them either.

    I have learned now that the difference between profiler attached to a build and profiling in ther editor is a huge difference in my usecase, build being about twice as fast. So some of the things I optimized based on editor profiling might have not been as bad as they looked but either way I now have my testcase at pretty solid 30fps.

    I've added a few physics objects to the setup:


    Booom:


    Profiler attached to build, slowest frame highlighted:



    For all the fire particles I'm using a single particle system and I call Emit(1) on it manually once per frame per burning object. To not let it get too crazy I put a cap on the number of particles emitted per frame, save the index and continue from there on the next frame. That gives an ugly pulsating effect on artificial test grids of neatly sorted burning entities, but in a reallife scenario it is barely noticable if at all.

    I got rid of the OverlapSphere call because the profiler in editor showed so much memory allocation for that and I replaced it with iterating over entities that have rigid bodies. This also could be the basis for a future implementation of entities that emit forces per frame within a certain radius. I'm thinking of things like a "singularity cannon" that fires slow moving, high damage projectiles, that themselfs have a slight gravitational pull to increase the visual impact and feel of the shot.
    There is a drawback to this, because I currently only check for the position and distance of objects. As long as I don't have a large variety in sizes that's not an issue, but if it becomes one I could add another size parameter to the entities to modify distance checks with very little additional cost to it.

    One thing that I took out based on editor profiler readings was the emitting of sparks when objects collide with a certain speed. The profiler showed great amounts of allocations for that. I revisited this with the profiler attached to the build just now:
    http://www.keinebilder.de/temp/unity-physics-stress-test12.jpg

    Seems like using OnCollisionEnter is responsible for the allocations. Here I took it out again:
    http://www.keinebilder.de/temp/unity-physics-stress-test13.jpg

    And to be absolutely sure it's not related to what I did within OnCollisionEnter I added an empty OnCollisionEnter function back in:
    http://www.keinebilder.de/temp/unity-physics-stress-test14.jpg

    Unless there is a way to activate/deactivate this gracefully per object I'll have to evaluate how I can solve this. E.g. I could take the functionality that I had in there, put it into another component and then decide which of my prefabs benefit the most from or need the functionality. A thing to keep in mind is that my testcase is a grid of exploding red-barrel-madness and these allocations only happen when the function is actually called. So if the usecase is a great number of entitys that don't cause a lot of collisions per frame it shouldn't be an issue.

    I have not started to optimize draw calls yet, because after all object pools etc. are filled and the framerate has stabilized, I still have ~230fps (4.3ms CPU) with 871 SetPass calls. I'll try to get that lower in the future but right now it seems other things got me more "bang for the buck".
     
  50. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Some things just naturally work so well with instancing. Do make haste, Unity :)
     
    Martin_H likes this.