Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

AssetBundle loading API

Discussion in '5.3 Beta' started by alexeyzakharov, Oct 1, 2015.

  1. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    On behalf of @kae:

    The new incremental building of asset bundles and its deterministic build pipeline is a great leap forward for Unity and I hope to see similar improvements in the player pipeline going forward.

    That said, there are a few things with the API I'd like to see changed/added. If you need more information on our use case to get some background context, just say so and I'll write it out - I'll try to start with a short version.
    We pack each asset file into its own asset bundle. We have a workflow where we want quick incremental build times of asset bundles and we have a later stage where we pack these built bundles into large packfiles containing a large amount of bundles each for deployment with a player. We want both parts to be fast, but we want the asset bundles in the packfiles to be small. Currently we build the asset bundles uncompressed to achieve quick build times and in the packfile step we LZ4 compress the uncompressed asset bundles, optionally using LZ4HC for the release builds.

    1. The current asset bundle loading APIs are suboptimal for loading asset bundles from packfiles. CreateFromMemory requires allocating massive byte arrays on the C# heap and CreateFromFile requires creating a bunch of files.
    I see a few options for extending the loading API here.

    1a.
    AssetBundle AssetBundle.CreateFromFileOffset(System.IntPtr fileHandle, long offset);
    The simplest API. Will not work with custom decompression. I added a file handle parameter instead of the path since packfile implementations usually keep it around anyway.

    1b.
    class AssetBundleStreamContext
    {
    // Pushes data to the asset bundle loading system.
    // Returns an assetbundle if it can be loaded from the buffered data or if it's already loaded, otherwise false.
    // This function should be called continuously to supply more data until the bundle can be created.
    public AssetBundle Load(byte[] buffer, int offset, int count);
    // Discards all internal state for the context, but does not affect a loaded AssetBundle.
    // This allows the AssetBundleStreamContext heap object to be reused.
    public void Clear();
    }
    AssetBundleStreamContext AssetBundle.CreateStreamContext();

    An example of a streaming API where the application has full control of pushing data to the loading system.
    This is the most flexible, supporting streaming from any media with any amount of latency. If the API is allowed to be used from other threads than the main thread, it also allows the user to be in control of what threads to load the asset bundles on.
    For those who cannot use the IL2CPP implementation, this does not introduce any native -> managed transitions, which can be a minor performance note.

    1c.
    // Returns false if there is no more data to be supplied, otherwise true.
    delegate bool AssetBundleReadCallback(byte[] buffer, out int bytesRead, object userObject);
    AssetBundle AssetBundle.CreateFromCallback(AssetBundleReadCallback readCallback, object userObject);
    AssetBundleCreateRequest AssetBundle.CreateFromCallbackAsync(AssetBundleReadCallback readCallback, object userObject);

    This type of API potentially allows Unity to be in charge of threading, but is not suitable for high latency media where all of the data might not be available synchronously and many loads are being done concurrently.

    Personally I'd be OK with both 1b and 1c - we use custom LZ4 compression currently so 1a is not an option for us. This situation might change with changes to the asset bundle compression API, as detailed in 2 and 3.


    2. It'd be nice to be able to compress an already built asset bundle without rebuilding from the source asset.

    bool BuildPipeline.RecompressAssetBundles(string[] assetBundlePaths, string[] outputPaths, BuildCompressionSettings newCompressionSettings);

    The API would read the asset bundles at the locations supplied in the array of asset bundle paths, decompress them if necessary, then compress and write the output to the corresponding element in the outputPaths array.

    3. It's important to be able to select which compressor and settings to use. The difference in compression speed between LZ4 default and LZ4 HC is quite significant.

    To build on Alexey's suggestion, a BuildCompressionSettings struct that can be passed into all the BuildPipeline Build* calls.
    I would put streaming as a CompressionMethod value instead of a bool.

    struct BuildCompressionSettings
    {
    public CompressionMethod method; // Compression method. Some share decompressors, like LZ4/LZ4HC
    public int level; // compressor-specific integer that specifies compression ratio vs speed
    public int dictSize; // memory to dedicate for dictionary
    public int blockSize; // size of compression block
    }

    Some fields will mean nothing for some compressors. 0 means default for all compressors.


    4. This is a bit of a wishlist point really, but I'll post it anyway because I think there can be real value in it.
    I'd like all the BuildPipeline APIs to provide support for custom read and write callbacks instead of using the file system directly. In our pipeline each file is built into an asset bundle so we have tens of thousands of files cluttering up the file system. I partition them into directories using the first two characters of the hex of the hash, so it doesn't slow down that bad, but I'd rather store them in a more optimal way for reading and writing. I know, I know, just buy SSDs right? Sigh.

    Let's make the asset pipeline amazing together! Tell me what you think.
     
  2. alexeyzakharov

    alexeyzakharov

    Joined:
    Jul 2, 2014
    Posts:
    507
    Hi Karl!

    1a. System.IntPtr fileHandle is very platform dependent and means creation and maintance of low level C# filesystem API.
    Adding offset to the existing CreateFromFile API makes sense and actually was present before.

    1b. Looks like this solution requires internal caching of a whole AssetBundle. In some cases (chunk-compressed or uncompressed bundle) we read only header to inject resources into a system and read data only when LoadAsset is called, so without callback it is equal to LoadFromMemory(byte[] binary, uint index, uint count, uint crc = 0).

    1c. This proposal gives a complete freedom to assetbundle's data source. However I personally would preffer using standard IO.Stream then as a universal chainable solution.

    2. Agree, could be useful.

    3. This is something we will definitely have! However the first step is to allow to specify simple compression options and then consider ability to specify per-asset compression. I'll try to land it in later betas.

    4. Which kind of callbacks are you talking about?
    BuildPipeline serializes data directly from an awaken asset in memory to a temporary destination file. It is not exactly reading data from Library folder representation and saving to another file. Asset is loaded from a disk and then serialized using the specific build settings.

    I've created a google doc with API proposal which covers your 3 suggestions
    https://docs.google.com/document/d/1K3zgOzcZVHQPbUIL-8joBw5gCBQMFmFxif94kwreQ_Y/edit
    Feel free to comment.

    And thanks a lot for a great feedback!
     
  3. kae

    kae

    Joined:
    Sep 27, 2012
    Posts:
    12
    Thanks for the reply Alexey!

    1a. Yep, I agree. There's System.IO.FileStream.SafeFileHandle though, but the System.IO.Stream API will cover this case.

    1b. Right, it would require buffering as much data as is required to serve the AssetBundle API. For our use case, we just load all the assets in an asset bundle as soon as the bundle is created, then unload the bundle. Maybe I should be asking for a AssetBundle.LoadObjectsFromBundle which forgoes the whole AssetBundle object and just gives me the UnityEngine.Object[]? That way you don't need to fulfill all of the AssetBundle API.
    Regardless, it differs from LoadFromMemory in the sense that we don't need to allocate a byte[] array as large as the bundle on the mono heap.

    1c. Yep, I agree with using the IO.Stream API.

    The reason I'd prefer an API like 1b is because I control which threads the reading is done on. Imagine streaming the asset bundles across the network or any other asynchronous media where the data is received off the main thread. The 1c API would require buffering data in between callbacks and shuffling it to another thread.
    It also allows me total control over which order bundles are loaded in. Imagine a packfile with a thousand bundles. I want the file system reads to be entirely sequential to avoid mechanical drive seeks. This means I need to guarantee the order in which asset bundles are created for that packfile. I can increase parallelism by splittnig bundles into more packfiles and using a job per packfile if I want to use more cores.
    AssetBundle.LoadFromStreamAsync shouldn't guarantee ordering across all loads if it is to scale for the future. I'll want to use all my cores, not just load everything off a single worker thread, but in the specific case with mechanical disk drives, this is not a good thing. Maybe we can add an integer argument as a sort of partitioning key for work thread distribution to solve the ordering problem?

    The extensions in the document are definitely a leap forward though and I look forward to seeing them implemented!
    One note though, I was initially confused about the bool streaming parameter in the compression parameters. I thought it had something to do with compression, but it looks like it's related to whether AssetBundle.LoadAsset() and similar functions load data on demand?

    4. Currently the AssetBundle.BuildAssetBundles call will read YAML-files off the disk for metadata. I store this data + extra metadata in LevelDB for usage in external tools and it'd be great if I could just implement a C# interface to have the asset bundle building system read and write from LevelDB for metadata instead of parsing a bunch of YAML files off the disk with random seeks all over the place. Also means I get to have one source of truth instead of having to mine the YAML files for metadata.
    In the same manner I'd like to be able to store the built asset bundles in a custom way to optimize reads during development, so a callback allowing me to provide a System.IO.Stream for writing built asset bundles would be great.

    Great to see the Player finally getting compression!
     
    Last edited: Oct 5, 2015