Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Bug Additively loading scene with large amount of lightprobes causes massive performance spike

Discussion in 'Global Illumination' started by TimHeijden2, Jun 14, 2022.

  1. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    I've been investigating a freeze in my game that occurs when I switch between two levels. In doing this, I first unload the old level, then additively load the new level. This is done for several reasons, including some stuff to do with networking which I won't go into.

    Eventually I've been able to get a test setup with a pretty much empty project with 2 pretty much empty scenes, only with both of them having a large amount of lightprobes. (LightingData.asset is about 34 MB)

    Profiling this using both "Additive" & "Single" LoadSceneModes I observed that there was a massive difference, with the "Additive" method getting a huge spike caused by whatever "PostLoadSceneStaticLightmapSettings" is doing, and the spike being absent when using the "Single" method. (see screenshot)

    I've also sent this as a bug report to Unity, but am curious of anyone else has also run into this and may have found a reason and/or workaround to it?

    (I tested this in 2021.3 LTS)
     

    Attached Files:

  2. kristijonas_unity

    kristijonas_unity

    Unity Technologies

    Joined:
    Feb 8, 2018
    Posts:
    1,080
    Hey! Could you please paste the bug ticket ID here? It seems like it still hasn't reached our team
     
  3. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Ah yes, probably should have added that right away. The ticket id is IN-7060
     
  4. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
  5. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Hi @kristijonas_unity Can you please confirm the bug report has reached your team? It is a major problem for us and wouldn't want this to get lost in ambiguity
     
  6. kristijonas_unity

    kristijonas_unity

    Unity Technologies

    Joined:
    Feb 8, 2018
    Posts:
    1,080
    It still hasn't. I suspect that it might have gotten stuck in the incoming bug reports queue. I'll reach out to our customer QA directly tomorrow about this.
     
    TimHeijden2 likes this.
  7. kristijonas_unity

    kristijonas_unity

    Unity Technologies

    Joined:
    Feb 8, 2018
    Posts:
    1,080
    We've received the issue. I've forwarded it to our backlog. Can't give you an ETA right now, but once there are updates, we'll let you know.
     
    trianglestudiosnl likes this.
  8. kristijonas_unity

    kristijonas_unity

    Unity Technologies

    Joined:
    Feb 8, 2018
    Posts:
    1,080
    Wanted to provide you with a quick update on this. We've found out that the issue reproduces regardless LoadSceneMode method is called. Resulting performance spike is almost identical.

    Our developer has taken a brief look at the code, and there seems to be nothing funky going on there at glance. They will keep on investigating further.
     
  9. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Thanks for the update! Its interesting that in my test case I was not getting the performance spike nearly as badly in Single LoadSceneMode. (though the test itself was not exactly identical)

    I also did some more testing and found that after reducing the amount of lightprobes to about 2/3 (lightingdata from 34MB to 20MB) the performance spike went down exponentially. (from 1000ms to 4-5ms) I tested this in the editor and may have made a mistake since that seems very weird. Atm we're working on a test reducing the light probes to compare the difference on our target device. (for which the spike currently is over 20000ms, which is why this is a major problem for us :p )

    Will also update once I get results on that.
     
    kristijonas_unity likes this.
  10. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    We've now done a bit more thorough testing and what I've said above is not true at all.

    The amount of lightprobes is linearly proportional to the performance spike, meaning the more you have the higher the spike. Target device measurements:
    * 50000 lightprobes: ~23000ms spike
    * 25000 lightprobes: ~13787ms
    * 10000 lightprobes: ~7072ms

    This means that as you've noted there is most likely not something weird going on in your code in terms of a literal bug. However, this is obviously still a major problem in trying to asynchronously load a scene. (and obviously unacceptable in terms of the performance spike)

    I'd like to know if it is possible for you to be able to offset this process to another thread, or at least spread the process out over multiple frames so as to not fully lock the main thread?

    If not, it is basically impossible for us to use lightprobes at all for this device... which would be very unfortunate as it greatly affects the visual quality of the game.
     
    kristijonas_unity likes this.
  11. belgaardunity

    belgaardunity

    Unity Technologies

    Joined:
    Oct 14, 2021
    Posts:
    8
    Thanks for the bug report, I have looked into the root cause of what you have observed.

    In your case you load a single scene at a time and this is usually an optimal case because then there is no need to recalculate tetrahedralization information, a simple memory copy operation is enough. I’m happy to report that we are in the process of backporting an optimisation from the next Unity release to 2021.3 which will roughly double the speed of this copy operation in your case.

    However, it may be that double speed is not quite good enough for you, so you could use a trick which is not well-known. While it is not possible to do the memory copy in a background thread, it is possible to recalculate tetrahedralization information on a background thread by having a persistent scene loaded first, one with very few light probes, then load the real scene. The effect of this is that tetrahedralization information must be calculated after loading the second scene, this can be done asynchronously with LightProbes.TetrahedralizeAsync.

    There will, however, still be a small spike left since some copying will still take place on the main thread, so going forward you should consider breaking up the scene into multiple smaller scenes, each with a much lower number of light probes. This is often a good practice.

    Also, you could consider reducing the density of the light probes network. While it’s convenient to generate a dense network it can be too resource intensive on low-end devices. In the sample project you provided, the tetrahedralization information alone took up about 30 MB of memory. Selectively positioning light probes as described in the documentation would require manual work but could improve performance and results considerably.

    In short,

    • A performance fix will provide nearly a 2x boost in terms of performance. The fix is being backported to 2021.3.

    • Create a scene and place a single light probe group into it. This will act as a persistent reference for light probe data. Additively load other scene(s) containing light probes and re-tetrahedralize asynchronously.

    • Split the light probe network into smaller chunks, and place them into their own separate scenes. Additively load them in runtime.

    • Consider reducing the density of the light probe network. Perhaps most probes are not really needed.
     
  12. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Hi belgaardunity, thanks to looking into the problem.

    Whilst the performance improvement is nice, it indeed will not fix the problem for us because the spike would still be way too large, even if we reduce the amount of lightprobes significantly.

    I tried to test your suggestion RE: a persistent scene reference, but running various tests I wasn't able to make this work without keeping that scene as the active scene forever. Is this expected? I'd also like to make sure I understand: the persistent scene can also be loaded additvely, correct?

    For reference, the order of operation would be:
    1. Load the persistent scene with 1 light probe (additive+async)
    2. Set persistent scene as active
    3. Load the real level scene (additive+async)
    4. Set the real level scene as active
    --- level switch ---
    5. Unload the old real level scene (async)
    6. Load the new real level scene (additive+async)
    7. Set the new real level scene active

    And for completeness sake: We were already using Tetrahedralize in our loading process, just not starting with a "dummy scene" as a reference point.

    If the requirement is that the "persistent" scene cannot be loaded additively and requires single load mode, this is full-stop not an option. If the "persistent" scene CAN be loaded additively, but must be the ActiveScene "forever" this would require significant reworks in a lot of our code, because the active scene for instance also spawns gameobjects in that scene.

    Splitting up a level in multiple scenes also brings significant other feature limitations, workflow limitations and loading performance issues in working with Unity, which is why this really isn't an option for us either...

    Finally whilst reducing the light probes is possible (also sacrificing detail/quality), we won't be able to get it to a point where this isn't a problem for us. 10000 lightprobes, which is 1/5th of our original setup still gives a 7000ms spike, whereas anything over 5000ms really isn't acceptable.

    ------------------------------------

    I'd like to understand the spike itself a bit more as well: What exactly causes the spike?

    A. Loading the 30MB lightprobes from disk into memory?
    I can't imagine this is the problem as loading from disk is easy to do on either another thread or over multiple frames rather than in a single frame/operation.

    B. Merging/Unmerging lightprobes between 2 scenes?
    I don't want to do this at all, my levels are completely unrelated from one another! Is there no way to disable this and force unloading & loading as if it is a new scene? (so A.)

    C. Tetrahedralization? <something else?>
    I doubt this is the problem as well, because I was already doing this operation and could see in the profiler this is not where the spike was.

    D. Something else???
    As noted, I may be missing the entire reason why the spike exists in the first place.
     
  13. belgaardunity

    belgaardunity

    Unity Technologies

    Joined:
    Oct 14, 2021
    Posts:
    8
    Hi Tim,

    The trick I mentioned in my second bullet point is meant to avoid a memory-to-memory copy on the main thread (which would ensure that you do not have to call Tetrahedralize). There is no need to make the persistent scene active, but you need to bake it for the trick to work. In terms of your sample project, try to load the persistent scene along with your existing startingscene in the hierarchy, then enter play mode and run your script. You will notice that the spike is significantly smaller (but you need to call Tetrahedralize before the lighting will look right).

    I hope this works for you.

    Here are some specifics related to your questions,

    A.
    Loading the 30MB lightprobes from disk into memory?
    It's a memory-to-memory copy, it's done as fast as C++ memcpy can do it (and that's fast). Unfortunately, with the current code structure this cannot be done safely in a background thread.

    B. Merging/Unmerging lightprobes between 2 scenes?
    In your sample project, you already unload, then load, so you only have a single scene loaded at a time. That's often optimal, but in your case you have massive light probes data in a single scene and that is why you see a spike.

    C. Tetrahedralization? <something else?>
    Explicit tetrahedralization is not needed when you have but a single scene loaded at any given time. You can see this by not receiving any LightProbes.needsRetetrahedralization events. With the trick I mentioned you will need to explicitly re-tetrahedralize after loading your real scene.

    D. Something else???
    I hope the above explanation helps.
     
    Last edited: Jul 5, 2022
  14. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    I've just tested your suggestion again, now using the sample project I've sent as you mentioned. The trick reduces the spike loading from "startingscene" to "Scene_A" (16ms to 8ms), but NOT "Scene_A" to "Scene_B" that occurs afterwards. (still ~1000ms)

    Just to confirm I'm not crazy, I swapped loading A & B and got the same results in order, the second load is at 1000ms in editor.

    Here is a link to the updated sample that now includes the dummy scene. Press T to include loading dummy scene, Press Y to exclude loading dummy scene. Adding the dummy scene before entering play mode (and then pressing Y) made no difference for me either.
    https://drive.google.com/file/d/1glaUixaAS-GvzGkw0EHO1sbGhRna0ZBa/view?usp=sharing

    I'm still confused about the spike. Scene_A has 30MB lightprobes, but so does Scene_B, why is it not a problem when I load the first scene? (to back that up, surely it doesn't take over 1 second on a PC to copy 30mb of data in memory?)
     
  15. belgaardunity

    belgaardunity

    Unity Technologies

    Joined:
    Oct 14, 2021
    Posts:
    8
    Hmm, now I'm confused.
    First of all, a memcpy of 30 MB will not take a second on a PC. In order to measure it on my i9 PC I added more profiling info and ran a debug build in the profiler, so in effect a much slower Unity editor, the copy took around 12 ms.
    Secondly, I tried your suggestion, appended your dummy scene in the hierarchy, entered play mode and pressed Y. I could not reproduce anything like your 1 second spike, it does not show up in the profiler? I used the official 2021.3.5f1 LTS for this. Could you show a screen shot of what you see in the profiler?
     
  16. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Certainly!

    edit: also added a zip with profiler data (note: I shortend the time between loads in my script to get a smaller profiler snapshot so I could upload this ^^)
     

    Attached Files:

  17. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Update: I've now also tried this in 2021.3.5f1 LTS and am also NOT getting this spike, interesting! Will try some more versions (and also on the main game) and get back to you.
     
  18. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Alright so my findings:

    - Unity versions:
    * The issue DOES occur in 2021.3.3f1 (the version I used for the bug report)
    * The issue DOES occur in 2021.3.4f1
    * The issue DOES NOT occur in 2021.3.5f1

    - Using (or not using) the dummy scene has no significant effect on the spike (the 1000ms one) in any of the unity versions

    Upgrading my main project will take a longer time, so this will likely come in 4-5 hours.
     
    belgaardunity likes this.
  19. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Hi @belgaardunity

    In my main project I'm getting the following results after upgrading to 2021.3.5f1:

    - Without a dummyscene, the problem still occurs consistently on 2nd level load
    - With a dummy scene, the problem did not occur on 2nd level load

    This is different from what we're seeing in the samplescene, which worries me because we don't know what is the cause for the spike. While it didn't occur in a specific test I did, there is no way of knowing it won't come back through circumstances we're unaware of even with the "trick".

    I'll need to do more thorough testing of this in both the sample project and my main project, but would really appreciate it if you would confirm the issue also occurs for you in the sample project with an older LTS version (like 2021.3.4f1) and if so, if you would be able to find out the cause of the massive spike. (and potentially the reason why that doesn't occur in the sample project with .5f1) This way, there isn't some mysterious/magical vanishing of the issue that could reappear but instead we have concrete evidence.
     
  20. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    hi @belgaardunity

    After running some testing since the upgrade, I can confirm the issue still occurs in my main game despite the upgrade even with use of the dummy scene. There was a bug in the code of making the dummy lightprobe scene where the lightprobes of the real level weren't being used at all (and thus not causing the spike).

    This means this is still an active showstopper for us unfortunately.
     
  21. belgaardunity

    belgaardunity

    Unity Technologies

    Joined:
    Oct 14, 2021
    Posts:
    8
    Hi @TimHeijden2.

    I'm sorry that you still see the spike. I'm afraid that the spike we could reproduce with your repro project was one that is insignificant for you. Unfortunately, we must conclude that we cannot reproduce what you see with your repro project.

    Are you able to reproduce the problem with a repro project?
     
  22. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Unfortunately not, since I don't know why this occurs it is difficult for me to extract the relevant parts of my game into a repro project.

    Just to be 100% sure: Are you saying you cannot reproduce the spike with an older LTS version, like 2021.3.4f1? With my repro project I was able to reproduce it with 100% consistency on several different machines. Whilst it doesn't occur in .5f1 onwards, it may give us a clue what to look for.
     
  23. belgaardunity

    belgaardunity

    Unity Technologies

    Joined:
    Oct 14, 2021
    Posts:
    8
    I actually can repro with 2021.3.4, I just couldn't explain what you are seeing in your game. So I took another look and I have a theory and a potential workaround. It looks like what you see in your game is a "deduplication" algorithm running. This is part of the code which has been heavily refactored in 2022.2 and partly backported to older LTSs including 2021.3.5, the backport probably being the reason you have a hard time crafting a repro for 2021.3.5.

    It's hard to explain exactly why, but it might help to trigger this algorithm to issue a warning in the console "Two Light Probes near ..." which is what you see with your repro project on 2021.3.5. If, on the other hand, you change your repro project to not have duplicate probes, you will be able to repro the problem, even with 2021.3.5. By duplicate probes I mean those which are very close, i.e. their positions overlap or are very close. By "might help" I mean that duplicate probes can cause nearby geometry to flicker, as the warning states.

    This will not be a problem in 2022.2.

    Please let me know if this helps. If it does, I will use that information to check if we can have more of the refactoring backported.
     
    Last edited: Jul 12, 2022
  24. TimHeijden2

    TimHeijden2

    Joined:
    Aug 11, 2016
    Posts:
    86
    Sorry I've been quite swamped with other work going on at the same time.

    I was able to reproduce the problem as you've mentioned in the newer version as well by copying a different light probe generation configuration (one we did to run 10000 probe vs 50000 probe test), whilst the spike is smaller it is indeed showing the same pattern.

    While this problem has been going on, we've been able to create an alternative solution reverting back to about 8 probes for an entire scene. It is a much lower quality result but somewhat acceptable, ensuring we aren't in a showstopper scenario indefinitely.

    Are the 2022.2 refactors already in the current beta, or will they be coming later? (If so, I can try upgrading the repro project to try and reproduce again)
     
  25. belgaardunity

    belgaardunity

    Unity Technologies

    Joined:
    Oct 14, 2021
    Posts:
    8
    Sorry for the late reply, I have been on vacation.
    Yes, the 2022.2 refactors are in the current beta.