Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Question Best way to get a constant array into a c# script

Discussion in 'Scripting' started by Ne0mega, Jun 26, 2023.

  1. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    I like to pre-cache distance values into a static array, so they can be accessed without calculating at all.

    The problem is, every time the game starts, it has to calculate all these. It only takes an extra second or so, but i still would rather just have a c# file, with a constant array, ie

    float[] distances = nee float[1000000] {0.717, 1, 1.414,... ...millionth distancevalue}

    So what is the best way to output, copy and paste, then input these values, .. should i just build a giant string in xml with commas between the values or maybe a text file, then copy and paste?
     
  2. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    36,561
    Editor script, Python script, etc.

    Are you sure you're getting any benefit? Watch this starting at around the 28-minute mark.

    Looking things up in huge tables is sloooooooow compared to some arithmetic.

    See the 30:30 mark...

     
    angrypenguin likes this.
  3. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    1,833
    If you really have one million of them, a binary data file. You may even want to partition it so you're not loading the whole thing taking up 4MB. But I agree with Kurt, lazy calculate and memoize.
     
  4. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    I cant watch the video.. ..due to bandwidth almost out for the month.

    But i dont see how looking something up in 1000 x 1000 float[,] would be slower than calculating the distance each and every time. Is that what he is addressing?

    Is it really really faster to do a vector2 distance calculation than a vector2 addition and get a value from a float[1000, 1000]?

    If so, ill have to bookmark and watch once i allow myself more bandwidth next month. (Trying to make a game, not feed my internet addiction)
     
  5. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,503
    It's not just about what's "faster", it's also about whether or not the difference in speed makes a difference. Is this an operation which actually bottlenecks your application?

    That said...

    It depends on whether or not it's cached, but yes, on modern processors it's often much faster to re-calculate a value than to access it from un-cached memory.

    And yeah, I just watched it, that's exactly what Kurt's video says. In the given example it's roughly 10x slower to access something from RAM than to perform a square root, which is the 'expensive' part of a distance calculation. Unless you are confident that your value will be in the same cache line as other stuff you're already getting, there's no reason to assume that pre-calculating it and storing it elsewhere would speed anything up.

    Of course, if performance really matters you can't afford to assume anything, you've got to measure it.
     
    CodeRonnie and spiney199 like this.
  6. Kurt-Dekker

    Kurt-Dekker

    Joined:
    Mar 16, 2013
    Posts:
    36,561
    Obviously it matters what the computation is, but two FMULs, an add and a FSQRT clearly come in a lot lower cycles than a main RAM fetch.

    If you're an old skool coder like me that seems foreign, and certainly back in the 6502, Z-80, 8086, 68000 early days, yeah, FSQRT was slow, as you did it in software.

    Today the floating point ALU/FPU is all lookup tables anyway, at least as far as I understand it... that's how it's so fast. So why make your own tables if all they do is encode a few multiplies and a square root? Why not just let the ALU/FPU do what it does?

    Besides, that's too much to maintain, except of course if you actually identified it as a bottleneck via profiling.

    Otherwise I would focus on making the games. :)
     
  7. FaithlessOne

    FaithlessOne

    Joined:
    Jun 19, 2017
    Posts:
    256
    Because I was curious about this re-calculate or cache thing regarding object distances I asked ChatGPT a couple of questions.

    Summary: It depends where the data is stored. But physics engine with max distance filter would likely be the fastest option, said GPT.

    Here is the chat protocoll:
     
  8. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    Thanks kurt.

    Its a pattern i used for searching all points within a radius via index for ai sight. (Strategy/rts ai on a world map, where everything is counted and analyzed, like resources, enemies, etc). This was certainly better than colliders considering what needed to be seen.

    I then wanted to have the ais estimate their etas, but movement speeds constantly change as seasons chang, so that crossing deserts in summer is slower or winter, polar areas, etc). Since I already knew the pattern of creating a generic table, i just made one for distances too... But it is integer points for distance, and i did wonder if perhaps there already was an internal table for such thing. Coding/maintaining it was/is easy, but, maybe if it just wasting 4mb of memory in RAM.. ..i dunno, i guess ill stress test, though stress testing is not so easy, heh.

    I also use the table for quick center of gravity calculation for clusters of units, but that is not called often enough t affect performance, but i figured since the table is already there, why not nix those sqrts too.
     
    Last edited: Jun 26, 2023
  9. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    Hmm, re chatGPT question, all possible distances for the map are calculated at the beginning of the game. Then every unit ai just looks them up per frame, (with a vector2 addition or subtraction) as opposed to calculating them per frame.
     
  10. halley

    halley

    Joined:
    Aug 26, 2013
    Posts:
    1,833
    ChatBS is always going to produce random stuff it's come across as if it's the gospel truth. Speaking of stuff it's come across, we already stated why calculating is typically the better approach on modern systems.
     
  11. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,043
    Exactly.

    --
    Aside from computation physically having less instruction cycles (which I'll mention in the 2nd paragraph), the expedience in software execution is mostly because of data locality. Modern CPUs cache a lot of memory (in multiple tiers) and try to take advantage of keeping the frequently accessed data local for caching in bulk, and so when software is done right, that's like turning TURBO on. Having huge look up arrays was useful in the 80s but in the modern architectures that's no longer a software done right, it absolutely kills all opportunities to maintain locality and optimize the cache, making the CPU more likely to have to swim through piranha ponds every time it needs some data.

    Do not underestimate the computing power of modern CPUs. Since 2010 or so we have entered an era where the actual computation is faster than fetching raw data from RAM, thanks to many advances in how caching works (and prediction algorithms), but primarily thanks to advanced instruction sets and significant ALU optimizations (as highlighted by Kurt).
     
    Last edited: Jun 27, 2023
    Ne0mega likes this.
  12. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,503
    That makes sense. Reading one value from an array is going to be faster than looking up data from two different Colliders and/or their Transforms (i.e. at least 2 lookups, potentially far more*) without even considering what math you do with them afterwards.

    As with most other things, there's no one-size-fits-all optimal solution. If you want your stuff to run well then you need to roughly understand your hardware, and what your code is making it do. In this case, that means understanding that memory reads aren't free, and thinking about where and why your code is performing them.

    * At least 2 for the Colliders. As Colliders rely on their Transforms for things, good chance there's at least 2 more reads to get data from those. If the data in Colliders / Transforms isn't packed in a way which suits the work you're doing, then you may in fact require more than one read of each to get what you need.
     
  13. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    Well.... ...wow.
    Thanks everyone. I get it.
    Definitely will watch the video when I get a chance. Mike Acton's ECS vid helped me understand a lot too.. mechanics like L2, L1 and L0 cache.

    I feel like I just learned a secret, since it seems everyone talks about square roots like they are death.

    These cpu and gpu things continue to blow my mind with how fast they are.
     
    Last edited: Jun 27, 2023
    angrypenguin likes this.
  14. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    Since we are on the subject...

    Right now my ai sight, as i said, does a vector2 add to know where to start to read off a list of indexes to view, then views a couple things, each on their own map.

    I used to use ints, because i heard that everything got converted to ints or floats anyways, (as opposed to bytes or bools). Then i asked someone who writes a package i use why he used bytes, and he said it does help to have bytes, (i hesitate to say who, bevause id hate to besmirch him if hes wrong, but his stuff is blazing fast and very popular)

    Now that i know RAM look ups are slow, would i be better off codifying everything into one map, then once the ai gets it, use math?

    For example, i have a tree map, and a unit map, and a building map. These dont carry much data per point. As of now, tree map carries, 0 - no tree, 1 - unseen tree, 2- seen tree.

    Unit map has 0 - no unit 1 - unit, 2 - seen unit

    Knowing what i know now, would it be better to combine these, and other values into a single map with a single value, and then use bitshifting or modulus to extract the values?

    The sight thing is the heaviest tax on my fps at the moment, and when there are 1000+ units, all analyzing dozens to thousands of points a frame, it does start to get bad.
     
    Last edited: Jun 27, 2023
  15. angrypenguin

    angrypenguin

    Joined:
    Dec 29, 2011
    Posts:
    15,503
    That's actually covered directly in the video Kurt linked, when you get a chance to watch it, but it doesn't give a general purpose fix because it depends on what work you most commonly do with that data.

    In short, if you're using a whole int to store what is effectively 2 bits of data then there's a huge amount of overhead in storing, moving and using that data, so there's probably room to optimise it. As for how, the first question is "what work do you most often do with each of your types?" If you have a common operation which needs to know about trees, units and buildings by location then there's a good chance that it's worthwhile to pack them together.

    But for things where it matters I'm not happy with "a good chance". I want to know. So I'd make a test case, put it on my target device(s), and measure it before and after the change.
     
    Kurt-Dekker likes this.
  16. orionsyndrome

    orionsyndrome

    Joined:
    May 4, 2014
    Posts:
    3,043
    I feel now you've flipped this on the head. It's not slow. It's that particular use case you were after that made everything potentially slower than it normally is. You've traded one bad thing for a horrible evil one disguised in good intentions.

    Square roots are not death. You still want to minimize excessive crap anyways. Don't normalize vectors or compute magnitudes if you don't have to. Quaternion multiplication is cheap. Vector dot and cross are cheap.

    It accumulates if you don't pay attention, and eats away your frame budget with every single thing you add. As your project grows you cannot possibly hunt down every single computation optimization opportunity and if you do that too late it's likely readability of your code will suffer. Plan such computations in advance, make it readable from the get go, minimize trigonometry, square roots, anything fancy with matrices and transforms if you can help it.

    Use mathematics and Burst for heavy-duty. Relegate stuff to shaders if possible, etc.
    But don't play smart and avoid unavoidable.

    Basically learn to code responsibly, not religiously.
     
    Ne0mega likes this.
  17. Ne0mega

    Ne0mega

    Joined:
    Feb 18, 2018
    Posts:
    702
    I watched the video. Thanks. Nipped my supposed solution in the bud. I was starting to think it was an optimization pattern that I could use for all sorts of operator avoidance to save some O's.

    I see why the pattern is good, such as pre-caching for definitions where one would have to access RAM multiple times, such as defining a point based on the 8 or 21 or more points around it, but terrible if it is just to "saves" a couple dozen of cycles, and instead inflating those cycles to 5 or ten times more.

    Back to the original question:
    .bin files, I can do that, but I seem to remember that Application.datapath caused all sorts of problems, and i had to switch to Application.persitentDatapath, which I even later found one day ill have to redefine again because of different file systems on different platforms, which I intend to do once I get there.

    My question is, can I save a .bin as an asset, that does not require fetching from a spelled out filepath, in particular not from Application.persistentDataPath? Right now, my persistent data files are for modding XML files, (not bins, another correction from Kurt a few years ago, so thanks again Kurt) and I would like to avoid clutter in there that could confuse or let the player break the game when modding.