Search Unity

Feedback [SBP] UpdateBundleObjectLayout - 60x optimization for non-trivial bundles

Discussion in 'Asset Bundles' started by OndrejP, Aug 1, 2021.

  1. OndrejP

    OndrejP

    Joined:
    Jul 19, 2017
    Posts:
    304
    UpdateBundleObjectLayout seems to be quite slow (and it's not cached).
    It does a lot of linear lookups, which I've optimized a bit.

    When moving 800 scripts into separate bundle (CreateMonoScriptBundle task), it took on my system about 61s.
    After optimization it takes only 8.8s, which is ~7x speedup.

    EDIT:
    Another set of optimizations brought it down to 1.0s, see post #2,

    Here's the code, would be nice if you'd used similar optimization in SBP package.
    The generated bundle content is identical.

    Code (CSharp):
    1. using System;
    2. using System.Collections.Generic;
    3. using System.Linq;
    4. using UnityEditor.Build.Content;
    5. using UnityEditor.Build.Pipeline.Injector;
    6. using UnityEditor.Build.Pipeline.Interfaces;
    7. using UnityEditor.Build.Pipeline.Utilities;
    8. using UnityEditor.Build.Utilities;
    9.  
    10. namespace UnityEditor.Build.Pipeline.Tasks
    11. {
    12.     /// <summary>
    13.     /// Updates the layout for bundle objects.
    14.     /// </summary>
    15.     public class UpdateBundleObjectLayoutFast : IBuildTask
    16.     {
    17.         /// <inheritdoc />
    18.         public int Version { get { return 1; } }
    19.  
    20. #pragma warning disable 649
    21.         [InjectContext(ContextUsage.In, true)]
    22.         IBundleExplictObjectLayout m_Layout;
    23.  
    24.         [InjectContext]
    25.         IBundleBuildContent m_Content;
    26.  
    27.         [InjectContext(ContextUsage.In)]
    28.         IDependencyData m_DependencyData;
    29.  
    30.         [InjectContext]
    31.         IBundleWriteData m_WriteData;
    32.  
    33.         [InjectContext(ContextUsage.In)]
    34.         IDeterministicIdentifiers m_PackingMethod;
    35. #pragma warning restore 649
    36.  
    37.         /// <inheritdoc />
    38.         public ReturnCode Run()
    39.         {
    40.             if (m_Layout == null || m_Layout.ExplicitObjectLocation == null || m_Layout.ExplicitObjectLocation.Count == 0)
    41.                 return ReturnCode.SuccessNotRun;
    42.  
    43.             Dictionary<ObjectIdentifier, HashSet<GUID>> referencedBy = new Dictionary<ObjectIdentifier, HashSet<GUID>>();
    44.             foreach (KeyValuePair<GUID, AssetLoadInfo> dependencyPair in m_DependencyData.AssetInfo)
    45.             {
    46.                 var asset = dependencyPair.Key;
    47.                 var assetInfo = dependencyPair.Value;
    48.                 LoadReferencedBy(asset, referencedBy, assetInfo.includedObjects);
    49.                 LoadReferencedBy(asset, referencedBy, assetInfo.referencedObjects);
    50.             }
    51.             foreach (KeyValuePair<GUID, SceneDependencyInfo> dependencyPair in m_DependencyData.SceneInfo)
    52.             {
    53.                 var asset = dependencyPair.Key;
    54.                 var assetInfo = dependencyPair.Value;
    55.                 LoadReferencedBy(asset, referencedBy, assetInfo.referencedObjects);
    56.             }
    57.  
    58.             Dictionary<ObjectIdentifier, List<string>> objectToFiles = new Dictionary<ObjectIdentifier, List<string>>();
    59.             foreach (var pair in m_WriteData.FileToObjects)
    60.             {
    61.                 foreach (var obj in pair.Value)
    62.                 {
    63.                     if (!objectToFiles.TryGetValue(obj, out var fileList))
    64.                     {
    65.                         fileList = new List<string>();
    66.                         objectToFiles[obj] = fileList;
    67.                     }
    68.                     fileList.Add(pair.Key);
    69.                 }
    70.             }
    71.  
    72.             foreach (var group in m_Layout.ExplicitObjectLocation.GroupBy(s => s.Value))
    73.             {
    74.                 var explicitObjects = new HashSet<ObjectIdentifier>(group.Select(s => s.Key));
    75.  
    76.                 string bundleName = group.Key;
    77.                 string internalName = string.Format(CommonStrings.AssetBundleNameFormat, m_PackingMethod.GenerateInternalFileName(bundleName));
    78.  
    79.                 // Update File to bundle and Bundle layout
    80.                 if (!m_WriteData.FileToBundle.ContainsKey(internalName))
    81.                 {
    82.                     m_WriteData.FileToBundle.Add(internalName, bundleName);
    83.                     m_Content.BundleLayout.Add(bundleName, new List<GUID>()); // TODO: assets/scenes not needed here?
    84.                 }
    85.  
    86.                 // Update File to object map
    87.                 List<ObjectIdentifier> objectIDs;
    88.                 if (!m_WriteData.FileToObjects.TryGetValue(internalName, out objectIDs))
    89.                 {
    90.                     objectIDs = new List<ObjectIdentifier>();
    91.                     m_WriteData.FileToObjects.Add(internalName, objectIDs);
    92.                 }
    93.                 HashSet<ObjectIdentifier> objectIDsLookup = new HashSet<ObjectIdentifier>(objectIDs);
    94.  
    95.                 foreach (ObjectIdentifier objectID in explicitObjects)
    96.                 {
    97.                     // Add dependency on possible new file if asset/scene depends on object
    98.                     if (referencedBy.TryGetValue(objectID, out var assets))
    99.                     {
    100.                         foreach (var asset in assets)
    101.                         {
    102.                             var assetFiles = m_WriteData.AssetToFiles[asset];
    103.                             if (!assetFiles.Contains(internalName))
    104.                                 assetFiles.Add(internalName);
    105.                         }
    106.                     }
    107.  
    108.                     // Remove object from existing FileToObjects
    109.                     if (objectToFiles.TryGetValue(objectID, out var files))
    110.                     {
    111.                         foreach (var file in files)
    112.                         {
    113.                             m_WriteData.FileToObjects[file].Remove(objectID);
    114.                         }
    115.                     }
    116.  
    117.                     // Add object into current FileToObjects
    118.                     if (!objectIDsLookup.Contains(objectID))
    119.                     {
    120.                         objectIDs.Add(objectID);
    121.                     }
    122.                 }
    123.             }
    124.             return ReturnCode.Success;
    125.         }
    126.  
    127.         private void LoadReferencedBy(GUID asset, Dictionary<ObjectIdentifier, HashSet<GUID>> referencedBy, ICollection<ObjectIdentifier> referenceObjects)
    128.         {
    129.             foreach (var obj in referenceObjects)
    130.             {
    131.                 if (!referencedBy.TryGetValue(obj, out var set))
    132.                 {
    133.                     set = new HashSet<GUID>();
    134.                     referencedBy.Add(obj, set);
    135.                 }
    136.                 set.Add(asset);
    137.             }
    138.         }
    139.     }
    140. }
    141.  
     
    Last edited: Aug 1, 2021
  2. OndrejP

    OndrejP

    Joined:
    Jul 19, 2017
    Posts:
    304
    Did another set of optimizations and now it takes 1.0s, which is total ~60x speedup!
    I mean is nobody working on SBP?

    Seriously, this is stable package we're taking about. Basis for Addressables, which is a core feature.
    It seems to me that there's plenty of space for optimizations, someone reliable should start working on that (hashing methods could be improved, cache serialization could be optimized).

    Code (CSharp):
    1. using System;
    2. using System.Collections.Generic;
    3. using System.Linq;
    4. using UnityEditor.Build.Content;
    5. using UnityEditor.Build.Pipeline.Injector;
    6. using UnityEditor.Build.Pipeline.Interfaces;
    7. using UnityEditor.Build.Pipeline.Utilities;
    8. using UnityEditor.Build.Utilities;
    9.  
    10. namespace UnityEditor.Build.Pipeline.Tasks
    11. {
    12.     /// <summary>
    13.     /// Updates the layout for bundle objects.
    14.     /// </summary>
    15.     public class UpdateBundleObjectLayoutFast : IBuildTask
    16.     {
    17.         /// <inheritdoc />
    18.         public int Version { get { return 1; } }
    19.  
    20. #pragma warning disable 649
    21.         [InjectContext(ContextUsage.In, true)]
    22.         IBundleExplictObjectLayout m_Layout;
    23.  
    24.         [InjectContext]
    25.         IBundleBuildContent m_Content;
    26.  
    27.         [InjectContext(ContextUsage.In)]
    28.         IDependencyData m_DependencyData;
    29.  
    30.         [InjectContext]
    31.         IBundleWriteData m_WriteData;
    32.  
    33.         [InjectContext(ContextUsage.In)]
    34.         IDeterministicIdentifiers m_PackingMethod;
    35.  
    36.         [InjectContext(ContextUsage.In, true)]
    37.         IBuildLogger m_Log;
    38. #pragma warning restore 649
    39.  
    40.         /// <inheritdoc />
    41.         public ReturnCode Run()
    42.         {
    43.             if (m_Layout == null || m_Layout.ExplicitObjectLocation == null || m_Layout.ExplicitObjectLocation.Count == 0)
    44.                 return ReturnCode.SuccessNotRun;
    45.  
    46.             Dictionary<ObjectIdentifier, HashSet<GUID>> referencedBy = new Dictionary<ObjectIdentifier, HashSet<GUID>>();
    47.             using (m_Log.ScopedStep(LogLevel.Info, "LoadReferencedBy (assets)"))
    48.             {
    49.                 foreach (KeyValuePair<GUID, AssetLoadInfo> dependencyPair in m_DependencyData.AssetInfo)
    50.                 {
    51.                     var asset = dependencyPair.Key;
    52.                     var assetInfo = dependencyPair.Value;
    53.                     LoadReferencedBy(asset, referencedBy, assetInfo.includedObjects);
    54.                     LoadReferencedBy(asset, referencedBy, assetInfo.referencedObjects);
    55.                 }
    56.             }
    57.  
    58.             using (m_Log.ScopedStep(LogLevel.Info, "LoadReferencedBy (scenes)"))
    59.             {
    60.                 foreach (KeyValuePair<GUID, SceneDependencyInfo> dependencyPair in m_DependencyData.SceneInfo)
    61.                 {
    62.                     var asset = dependencyPair.Key;
    63.                     var assetInfo = dependencyPair.Value;
    64.                     LoadReferencedBy(asset, referencedBy, assetInfo.referencedObjects);
    65.                 }
    66.             }
    67.             var referencedBy2 = referencedBy.ToDictionary(s => s.Key, s => s.Value.ToArray());
    68.  
    69.             Dictionary<ObjectIdentifier, List<string>> objectToFiles = new Dictionary<ObjectIdentifier, List<string>>();
    70.             using (m_Log.ScopedStep(LogLevel.Info, "LoadObjectToFiles"))
    71.             {
    72.                 foreach (var pair in m_WriteData.FileToObjects)
    73.                 {
    74.                     foreach (var obj in pair.Value)
    75.                     {
    76.                         if (!objectToFiles.TryGetValue(obj, out var fileList))
    77.                         {
    78.                             fileList = new List<string>();
    79.                             objectToFiles[obj] = fileList;
    80.                         }
    81.                         fileList.Add(pair.Key);
    82.                     }
    83.                 }
    84.             }
    85.  
    86.             Dictionary<string, HashSet<ObjectIdentifier>> fileRemoves = new Dictionary<string, HashSet<ObjectIdentifier>>();
    87.             foreach (var group in m_Layout.ExplicitObjectLocation.GroupBy(s => s.Value))
    88.             {
    89.                 string bundleName = group.Key;
    90.                 string internalName = string.Format(CommonStrings.AssetBundleNameFormat, m_PackingMethod.GenerateInternalFileName(bundleName));
    91.  
    92.                 List<ObjectIdentifier> objectIDs;
    93.                 HashSet<ObjectIdentifier> objectIDsLookup;
    94.                 HashSet<GUID> assetsNeedingFileReference = new HashSet<GUID>();
    95.                 using (m_Log.ScopedStep(LogLevel.Info, $"MoveObjects ({bundleName}) - prepare"))
    96.                 {
    97.                     // Update File to bundle and Bundle layout
    98.                     if (!m_WriteData.FileToBundle.ContainsKey(internalName))
    99.                     {
    100.                         m_WriteData.FileToBundle.Add(internalName, bundleName);
    101.                         m_Content.BundleLayout.Add(bundleName, new List<GUID>()); // TODO: assets/scenes not needed here? - also empty in original code
    102.                     }
    103.  
    104.                     // Update File to object map                
    105.                     if (!m_WriteData.FileToObjects.TryGetValue(internalName, out objectIDs))
    106.                     {
    107.                         objectIDs = new List<ObjectIdentifier>();
    108.                         m_WriteData.FileToObjects.Add(internalName, objectIDs);
    109.                     }
    110.                     objectIDsLookup = new HashSet<ObjectIdentifier>(objectIDs);
    111.                 }
    112.                 using (m_Log.ScopedStep(LogLevel.Info, $"MoveObjects ({bundleName}) - move"))
    113.                 {
    114.                     var explicitObjects = new HashSet<ObjectIdentifier>(group.Select(s => s.Key));
    115.                     foreach (ObjectIdentifier objectID in explicitObjects)
    116.                     {
    117.                         // Add dependency on possible new file if asset/scene depends on object
    118.                         if (referencedBy2.TryGetValue(objectID, out var assets))
    119.                         {
    120.                             foreach (var item in assets)
    121.                             {
    122.                                 assetsNeedingFileReference.Add(item);
    123.                             }
    124.                         }
    125.  
    126.                         // Remove object from existing FileToObjects
    127.                         if (objectToFiles.TryGetValue(objectID, out var files))
    128.                         {
    129.                             foreach (var file in files)
    130.                             {
    131.                                if (file == internalName)
    132.                                     continue;
    133.  
    134.                                 if (!fileRemoves.TryGetValue(file, out var removeSet))
    135.                                 {
    136.                                     removeSet = new HashSet<ObjectIdentifier>();
    137.                                     fileRemoves.Add(file, removeSet);
    138.                                 }
    139.                                 removeSet.Add(objectID);
    140.                             }
    141.                         }
    142.  
    143.                         // Add object into current FileToObjects
    144.                         if (!objectIDsLookup.Contains(objectID))
    145.                         {
    146.                             objectIDs.Add(objectID);
    147.                         }
    148.                     }
    149.                 }
    150.                 using (m_Log.ScopedStep(LogLevel.Info, $"MoveObjects ({bundleName}) - add file references"))
    151.                 {
    152.                     // Add dependency on possible new file if asset/scene depends on object
    153.                     foreach (var asset in assetsNeedingFileReference)
    154.                     {
    155.                         var fileList = m_WriteData.AssetToFiles[asset];
    156.                         if (!fileList.Contains(internalName))
    157.                             fileList.Add(internalName);
    158.                     }
    159.                 }
    160.             }
    161.             using (m_Log.ScopedStep(LogLevel.Info, $"Remove FileToObject references"))
    162.             {
    163.                 foreach (var pair in fileRemoves)
    164.                 {
    165.                     m_WriteData.FileToObjects[pair.Key].RemoveAll(s => pair.Value.Contains(s));
    166.                 }
    167.             }
    168.             return ReturnCode.Success;
    169.         }
    170.  
    171.         private void LoadReferencedBy(GUID asset, Dictionary<ObjectIdentifier, HashSet<GUID>> referencedBy, ICollection<ObjectIdentifier> referenceObjects)
    172.         {
    173.             foreach (var obj in referenceObjects)
    174.             {
    175.                 if (!referencedBy.TryGetValue(obj, out var set))
    176.                 {
    177.                     set = new HashSet<GUID>();
    178.                     referencedBy.Add(obj, set);
    179.                 }
    180.                 set.Add(asset);
    181.             }
    182.         }
    183.     }
    184. }
    185.  
     
    Last edited: Aug 1, 2021
  3. Ryanc_unity

    Ryanc_unity

    Unity Technologies

    Joined:
    Jul 22, 2015
    Posts:
    332
    Thanks for the information, I've added a ticket to look into this.
     
  4. OndrejP

    OndrejP

    Joined:
    Jul 19, 2017
    Posts:
    304
    @Ryanc_unity Thanks, could you also expose some parts of the BuildCache (currently mostly internal) so I can write my own build tasks and utilize the ability to cache the results?

    Build Cache
    Are there any plans to improve the access to build cache from C#?
    Currently it seems it's just bundle of 'object' data, which use ISerializable.
    I'd love to have something more structured, because it's hard to maintain the code which uses things like:
    Cache.Data[0] as BuildTagSet
    Cache.Data[1] as List<ObjectTypes>

    I would prefer having Cache entry for asset to be a class and Cache entry for scene dependency to be different class.
    These classes could contain basic data and other "custom" data could be stored in dictionary with string key, that would make it easier to maintain than just bunch of unknown objects where each index means something else.

    Bundle Layouts
    In my workflow, I've found that I usually don't know full bundle layout before starting the build. I only know roots (scenes and some other objects). I would like to be able to define bundle layout during the build process. This is not very hard for scenes (just add another build task after CalculateSceneDependencyData and add assets into m_Content.Assets, Addresses, BundleLayout), but it's nearly impossible for assets (dependencies are loaded and filtered in the same step).

    I think it would be great when some build tasks would be split into smaller tasks.
    I would supply the roots (scenes, assets) to build pipeline through IBundleBuildContent.
    Build pipeline would crunch the assets (use cache) and load direct assets dependencies for all assets recursively.
    I would be able to put my own tasks into pipeline, which would use the data and define final bundle layouts.
    Then the tasks calculating dependencies would be run.

    I've looked into NonRecursiveDependencyData, which goes into good direction, but it's not nearly there.
    I'd love to have dependency graph of all assets pulled into the build generated at the start of the build. NonRecursiveDependencyData does that, but only for assets DIRECTLY included into the build. Not for their dependencies (they just gets squashed into single list, build usage tags gets squashed as well).

    This could also improve build cache performance. Assets not directly included in the build are now processed multiple times.

    Example:
    AssetA - HugePrefabC - Textures, Meshes, Materials...
    AssetB - HugePrefabC - Textures, Meshes, Materials...

    If I put directly into asset bundle AssetA and AssetB, CalculateAssetDependencyData now processes dependencies of HugePrefabC twice (ContentBuildInterface.GetPlayerDependenciesForObjects called in ExtensionMethods.FilterReferencedObjectIDs).

    Also, when AssetA would have different dependency and that dependency would change, it would invalidate cache entry for AssetA and recalculate dependencies for HugePrefabC again.

    What I'm trying to say is that smaller granularity of cache would help here. I haven't done the measurements, but the downside seems to be that ContentBuildInterface.GetPlayerDependenciesForObjects could not be "batched" so well and that could make non-cached build considerably slower. But I'd be happy to pay the price, because subsequent builds would be considerably faster, since only minority of assets usually change.

    I'd be happy to discuss my use cases for Scriptable Build Pipeline, if you're interested.
     
  5. Ryanc_unity

    Ryanc_unity

    Unity Technologies

    Joined:
    Jul 22, 2015
    Posts:
    332
    > Build Cache

    The BuildCache is an interesting beast. It, as well as the USerializer system is very tightly coupled based on the exact needs of the build pipeline. If we open the BuildCache up for more direct use, the USerializer system also needs a lot more work to increase it's serialization feature and type support. At this time it's unlikely we will focus on exposing this area as our next plans are to move the BuildCache's info files from file based caching to a memory mapped data structure to reduce file io which in turn gives us a lot more performance for the next step:

    > Build pipeline would crunch the assets (use cache) and load direct assets dependencies for all assets recursively.

    This is very similar to our next steps, we were afraid of file io early on and cached the inputs and their recursive dependencies in a flattened list. Non-Recursive Dependency mode improves this a little by only calculating and caching up until it encounters another asset flagged to be in a bundle. So our next plans are to caching only per asset direct dependencies and recursively calculating and caching each dependency encountered as this can have a big impact for large projects with very complex and nested dependency chains. I'll keep in mind your use case and try and make sure we have a way to achieve that with this change in some form.

    > What I'm trying to say is that smaller granularity of cache would help here. I haven't done the measurements, but the downside seems to be that ContentBuildInterface.GetPlayerDependenciesForObjects could not be "batched" so well and that could make non-cached build considerably slower.

    We've recently done quite a bit of deep profiling on SBP, and while I admit the API's we created for SBP v1 are not the most efficient (and we have a newer set of APIs designed and prototyped that are), there is a lot more wasted time in generic code paths that are used for more than just build that are more problematic that we are improving now.
     
  6. OndrejP

    OndrejP

    Joined:
    Jul 19, 2017
    Posts:
    304
    > Build Cache

    I understand that universal serialization support would be a lot of work, but I don't expect universal support. All I need is some interface which would allow me to serialize/deserialize the cached object into/from byte array. I'd use it in rare cases anyway, so writing serialization code by hand would not be an issue.
    My need for this was mostly because of the "flattening" so it might not be needed in the end anyway.

    > Build pipeline would crunch the assets (use cache) and load direct assets dependencies for all assets recursively.

    This is exactly the direction I was hoping for, it will make incremental builds faster. If I'm able to define bundle layout from processed assets before next part of the build takes place, that would be awesome! For now, I'm using 'UpdateBundleObjectLayout', which does not expose assets (does not give them address), but at least moves them and prevents duplication.

    Glad to hear more optimizations "on the engine side" are coming.

    One last thing, are the any plans for validation? It happened to a lot that I did some changes and end up with broken bundles, missing references etc. It would be okay if the validation would be turned on by some bundle build option.