Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.
  2. Dismiss Notice

Are computers about to be turned inside out?

Discussion in 'General Discussion' started by Arowx, Nov 23, 2016.

  1. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194


    Computers process information, so information goes in gets processed and comes out.

    Now computers are really really fast at processing the data, but very slow at getting the data.



    This could be because we have built computers the wrong way.

    Partially this is because data was too big for the computer's main memory and had to be stored externally on slower but higher storage capacity devices.

    Even system ram is a bus journey away from the processor, and processors then have layers of caches or 'queues' before they get to the processor.

    How fast would your computer be if it's processor had instant access to the data it needed, well we would say goodbye to loading bars and file transfer dialogues.

    But this is crazy talk without some magical technology that provides Quantum Linked or Entangled Bits we will always have to move the data to the processor.

    Now look at the trend, HD, SSD, Larger Caches, HBM, SSD strapped onto a GPU, 3D Crosspoint and other Memristor style technologies.

    One way could be to strap cheap ARM cpus to every RAM stick, making smart ram. The CPU would request the data and send the processing code then just flag the data as processed when it's done.


    (early attempt at smart ram, a Raspberry Pi supercomputer ;)).

    Another way to make RAM smarter could be the use of Memristor technology so RAM would also include a FPGA memristor element that can be configured by the code needed to process the data.

    So we could end up with computers that move the processing to the data rather than the data to the processor.

    We are seeing a subset of this problem in gaming with the divide between the CPU, GPU and their data.



    It's logical really what's larger to move, the data or the processing code needed to act on it!

    What would a gaming smart ram system look like?

    Would smart ram systems be more modular?

    Would we still be limited by bandwidth between smart ram blocks?
     
    Last edited: Nov 23, 2016
  2. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    Sigh.

    Well, you outdid yourself this time. This post tops your previous ones. Even the one about double precision.

    I'd advise to do some OpenCL/CUDA programming or at least programming in a language that allows direct memory access. C or Assembly. That would give you better ideas about memory/etc.
     
    Kiwasi, landon912, Teravisor and 9 others like this.
  3. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194

    Here's an example of timings for computer operations based on a 1Ghz cpu

    execute typical instruction 1/1,000,000,000 sec = 1 nanosec
    > fetch from L1 cache memory 0.5 nanosec
    branch misprediction 5 nanosec
    > fetch from L2 cache memory 7 nanosec
    Mutex lock/unlock 25 nanosec
    > fetch from main memory 100 nanosec
    send 2K bytes over 1Gbps network 20,000 nanosec
    read 1MB sequentially from memory 250,000 nanosec
    fetch from new disk location (seek) 8,000,000 nanosec
    read 1MB sequentially from disk 20,000,000 nanosec
    send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec​

    Note the nearly 10x speed change between L1, L2 and main memory, the source is a bit dated but RAM access involves more than just the raw timings on your DRAM.

    Or have you read Forget Moore’s law: Hot and slow DRAM is a major roadblock to exascale and beyond
     
    Last edited: Nov 23, 2016
  4. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Right well, this doesn't solve Unity not moving fast enough to make my game for me.
     
    Kiwasi and dogzerx2 like this.
  5. Socrates

    Socrates

    Joined:
    Mar 29, 2011
    Posts:
    786
    "Are computers about to be turned inside out?"

    Only if they go through the Galaxy Quest transporter.

    They'll also explode...
     
  6. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Think how fast a SmartRAM(tm pending) computer could be at lightbaking.
     
  7. LaneFox

    LaneFox

    Joined:
    Jun 29, 2011
    Posts:
    7,381
    You mean you guys aren't using Quantum Entanglement Ram yet?

    What the heck, get with the program guys geez.
     
    Kiwasi and landon912 like this.
  8. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    8,950
    "And it exploded..."
     
  9. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Sorry dude but most indie developers don't work for the NSA or Google!
     
  10. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    It wouldn't be any faster. Because the primary bottleneck is not memory, but computational power. Wanna speed it up, parallelize while using normal hardware.

    Besides, you'd need to explain how the heck your "smart memory" is supposed to work.

    Honestly, it just sounds like you found a new buzzword idea and decided that this one should be a "miracle that changes everything forever!" or something like that. That's not really how it works.
     
  11. Shorinji

    Shorinji

    Joined:
    Oct 8, 2012
    Posts:
    21
    When i read the forum, I never check who post what, i just read the information contained in the threads.
    But strangely, i can always guess when a thread has been posted by Arowx :D
     
  12. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    @Arowx: Basically... a supercomputer cluster will destroy any of your "smart memory" (how the hell is this supposed to work anyway?) machine in terms of light baking speed. Except that computer cluster can be composed out of large numbers of identical nodes which will exchange data over network, and not necessarily at a high speed.

    As another example, you may have human brain. IIRC closest equivalent of a clock speed of a human brain is 100 Hz top. That's Hertz, not Megahertz and not Gigahertz. However, (if we simplify things a bit) one could say that brain has hundred billion cpus. As a result, you have massive computational power which governs locomotion, body functions, and is wasted on things like watching cats on youtube.

    Basically... I'd recommend to improve your programming knowledge, because a lot of your threads revolve around concentrating on a minor non-problem and then trying to find a "revolutionary never seen before magical way that will change the world forever!" approach to "fix" it.
     
    Kiwasi, MV10 and Deleted User like this.
  13. hippocoder

    hippocoder

    Digital Ape Moderator

    Joined:
    Apr 11, 2010
    Posts:
    29,723
    Nitpicking but new studies show that the brain a way, way faster than 100hz. In fact even seeing (and understanding what is seen) is proven to take 13ms.

    Reacting to it is another matter entirely though. And actually, lightmap baking would be a LOT faster if it all fit in L1, so yeah, memory really kinda matters.

    Just not the solutions proposed in this thread... still doesn't help me make my game though.
     
    dogzerx2 likes this.
  14. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    Well,brain is analog circuitry (with loads of gotchas, like chemical signals in addition to electrical ones), so there's no clock. So a speed of signal passing through a neuron can be indeed very high. However, there's still going to be some cyclical process going on all the time, and that cycle can be treated as a clock of sorts. That'll be 100 hz.

    I think it is a decent comparison, because clock in computer CPU orchestrates everything in similar fashion... except the speed is much higher.
     
  15. imaginaryhuman

    imaginaryhuman

    Joined:
    Mar 21, 2010
    Posts:
    5,834
    I'm sure it will all gradually get faster. Memory chips are now much faster than they used to be. Caches are bigger. Also take playstation 4 for example, they strapped on 'graphics memory' chips as the main memory for a major speed boost. The other issue though is parallelization... more cpu cores have to struggle to share memory access bus etc. Obviously the closer stuff gets to the cpu the faster things will be.
     
  16. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    @negininity good point the brain could be classed as the ultimate Smart RAM as it combines memory and logic into biological neurons and axons. The logic that makes a neuron fire and the memory that neuron represents are combined.

    In traditional IT we have separated memory and logic.

    That's why the concept of the Memristor could massively change the IT industry and due to is similarity to a neuron AI research as well. Combined with it's low power needs compared to DRAM.

    Computing is just bringing logic and data together, funny that nature has already provided an amazing nanometer scale solution to this problem with neurons weighing in around the 10 nm level.

    The question is what level of granularity should Smart RAM work to, bits, bytes, kilobytes, megabytes, gigabytes for optimal bandwidth?
     
  17. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    2,980
    Smart RAM would be dumb, because it would create a processing bottleneck instead of removing a bandwidth bottleneck.

    A better solution would be something like HBM where the RAM is installed on the CPU. Eventually we will probably see a CPU or APU with enough HBM to be useful in many use cases. This would most likely be at 8GB.
     
  18. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    8,950
    Are "computers" even still a thing?
     
    Kiwasi likes this.
  19. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    So isn't a CPU or GPU with Stacked HBM a single Smart RAM module.

    You do understand that HBM on modern CPU's would be a L4 cache, there are already L1, L2 and L3 ram caches built into modern PC's if putting memory next to your processor or Smart RAM is so dumb why do Intel, Amd and Arm build chips with caches (hint)
     
  20. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Isn't that the wrong question?

    "Is anything not a computer?" would have been more relevant.
     
  21. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    8,950
    Everything (nearly) is a connected processor, computers as a general purpose box are becoming a minority device. Even they rely more and more on distributed or remote processing for data/content.
     
  22. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    2,980
    The difference is performance. You were asking about adding an ARM processor to RAM, which would actually create a bottleneck.
     
  23. I_Am_DreReid

    I_Am_DreReid

    Joined:
    Dec 13, 2015
    Posts:
    361
    What is going on here/??
     
  24. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,026
    Just someone throwing around a buzzword. :p
     
    zombiegorilla and Deleted User like this.
  25. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    8,950
    RISC architecture is going to change everything.
     
  26. Teravisor

    Teravisor

    Joined:
    Dec 29, 2014
    Posts:
    654
    I sure don't want THAT in my gaming PC.
    It will be smart enough to stop me from gaming...
     
  27. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Just look at the trends in the computing industry, IBM and HP pushing for fibre optical connectors for their servers (internal and intraserver).

    Intel, Amd and Nvidia using HBM and HBM2 memory to put the memory next to the CPU and GPU.

    They are all working to turn the traditional computer inside out, putting the processor next to the memory and massively boost the bandwidth between components.

    There is a consortium working on a next gen memory interconnect (article).


    Did you notice that this IT consortium thinks "system memory is flat or shrinking", that memory bandwidth is not keeping up with core counts.

    I think there is enough of a trend here to make a prediction, expect the rise of Smart RAM systems with optical interconnect buses to be the new standard within 5 years (so 2021).

    Ideally we will end up with modular blocks of CPU/GPU's with HBM2+ and only needing a fiber optic cable/power cable to connect them to your system.

    Systems will still be limited by their power units capacity and 'motherboards' Smart RAM slots but with a system like this you could more easily upgrade and who know's even change brand or combine brands (what I'm a dreamer on the Unity forums after all).

    In a way Smart RAM is just a similar approach used by DX12/Vulkan. GPU's got smarter so didn't need as much information from the CPU so why all the draw calls. Just instead of the central CPU doing all the processing they delegate to the relevant SmartRAM blocks.
     
    Last edited: Nov 24, 2016
  28. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    True.

    False. Better bandwidth/different architecture != "inside out".

    No. Your vague idea of SmartRam has nothing to do with with higher memory bandwidth.


    ---
    They are eliminating bottlenecks and making faster RAM. That's reasonable.
    You want to stick an extra cpu into a memory stick. That's pointless.
     
  29. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    You seem to be mixing up the facts, HBM/HBM2 is putting the memory next to the processor (CPU or GPU) it is a Smart RAM solution.

    The idea of Smart RAM is just that you bring the processing next to the ram.

    It's flipping the bandwidth problem on it's head, if you have high bandwidth modular blocks of processors and ram, you can have lower bandwidth connections that issue commands.

    When processing information, it is often the case that the program is a lot smaller than the data it affects.

    The question you should be asking is "How often is the code smaller than the data?" as only when the data is bigger do we need a Smart RAM style solution.

    So it is definitely a valid approach for high data volumes, Supercomputers (Knights Landing HMB), Database Servers, Web Servers and GPU's.

    But what about games how often are games data heavy?

    Are games getting bigger with higher resolutions, larger worlds and more players?

    Probably we will see Smart RAM style systems in the game development pipeline for crunching the data, lighting, modelling, generating and training the Deep Learning Neural Networks for the NPC's and AI (it's going to happen sooner or later).
     
  30. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    As far as I can tell, you're the one who invented "SmartRAM", and that name is not used by anyone else.
    The name alone implies that you put additional cpu on the chip.

    What HBM seems to be about is simialr to the old SOC concept

    This statement doesn't make any sense to me.

    Sigh...

    Seriously, just take programming or computer science course already.
     
  31. ShilohGames

    ShilohGames

    Joined:
    Mar 24, 2014
    Posts:
    2,980
    Actually, the opposite is more likely. Before recommending fiber optic as a magic cure all for bandwidth, you need to realize that fiber optics will add additional cost, size, weight, and latency. On each end of a fiber optic link, there are emitters and detectors that convert between electrical signals and optic signals. That layer is not a significant problem on large network devices, but it would pose a serious challenge if fiber optics were used extensively inside a computer system.

    By contrast, HBM style solutions literally place the memory directly on the CPU/APU/GPU, so there is no need for a large external interconnect between the CPU and the RAM.

    The next question is could fiber optics be used to connect a HBM style CPU to a computer. That would be more reasonable than using fiber optics between the CPU and RAM, but it would still create a layer of bottlenecks. There are a lot of parallel traces between the CPU and the rest of the subsystems. Converting all of those into separate fiber optic paths would be completely impractical, due to the need for emitters and detectors on each path. The alternative would be to develop a next generation serial link for the CPU to connect through using a single fiber optic, but that would most likely not offer enough bandwidth to be useful.

    There are always tradeoffs in circuit designs. Fiber optics would most likely not be very useful between the CPU and the rest of the subsystems, unless there were some breakthroughs in the electronics to support fiber optic solutions. Examples of required breakthroughs would be things like integrating the fiber optic emitters and detectors directly into a CPU design, and being able to manufacture tiny fiber optics traces directly onto PCB so boards could be automatically manufactured with thousands of fiber optic traces run precisely on boards.
     
  32. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    OK let's try a simple thought experiment.

    Space Invaders, 100 enemies need to move right, they will also have their animation frame updated, this frame.

    pseudocode:

    Code (CSharp):
    1. foreach(invader in spaceInvaders)
    2. {
    3. invader.postion.x += movedelta;
    4. invader.sprite.frame++;
    5. }
    from this we can gather that an invader has at least a 2d position a reference to a sprite that has a frame index.

    So let's say that an invader takes up at least 9 bytes x 100 = 9000 bytes or about 9k.

    We are doing about 6 operations on the data so 600 ops or ticks to do the job.

    On a standard RAM system it takes 100 ticks to get the data before we work on it and the same again to return it to memory so about 800 ticks to do the job (assuming the 9k is sent as a data block with no overheads).

    What if a Smart RAM system inherently distributes the data and jobs evenly (Just like Unity's threaded job system).

    You would still have the overhead of sending the commands / code to do the jobs (but just like a GPU once sent they can be remembered or cached and re-triggered).

    So it would depend on the number of smart ram blocks you have on your system, it's inherently parallel.

    1 SmartRAM's 200+600 = 800 (or normal system)
    2 SmartRAM's 200+300 = 500
    4 SmartRAM's 200+150 = 350
    8 SmartRAM's 200+75 = 275

    This is of course assuming that sending the commands to SmartRAM has the same overhead as accessing DRAM, which would probably be valid for the first time you send a set of commands. But when you repeat those commands as games tend to do every frame. You could just send a simpler run previous command op.

    Think about it a bit you can purchase low cost SOC (systems on a chip) for a few dollars, they are not as fast as your desktop PC but how many could you buy for the price of a PC.

    Now imagine if you could buy cheap Smart RAM blocks that included a fiber optic networking socket and a OS that worked with Smart RAM.
     
  33. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    IBM is working on...
    http://spectrum.ieee.org/semiconductors/optoelectronics/get-on-the-optical-bus
     
  34. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    "IBM is working" means "technology is not ready". It is a waste of time to "imagine the possibilities", because human imagination can run wild.

    Wrong, all of it. Wrong assumption upon wrong assumption upon wrong assumption.

    The amount of time your program spends processing merely a hundred objects will be insignificantly small. Majority of time will be spent waiting for vsync, (and a bit of time will be spent processing audio and painting the invader).

    So improvement from faster memory will be zero.

    There is a reason why you must profile before optimizing. There are areas that can be optimized, but won't have much impact. Meaning, if you have a piece of code that program spends 0.1% of time performing, and you reduce its execution time to zero, you'll get a tiny speedup that will be impossible to notice.

    Your CPU is already very likely to be performing operations in parallel and ahead of time:
    https://en.wikipedia.org/wiki/Speculative_execution

    And for highly parallel tasks there are already compute shaders. However, those require larger number of operations for anyone to notice a difference. Billions of space invaders, pretty much.

    That doesn't work. Low cost SoC will have low clock rate and low computational power and it will have to talk to other SoC's via some sort of bus. Talking via a bus will introduce overhead, if it is done on per-frame basis, it will be enough overhead to shadow any improvements. Same goes for fiberoptic cable. Light has speed, and fiber-optic cable will be longer than distance to nearby circuit. Also, the information will have to be encoded and then decoded, which will also take time. In addition to that, parallel invaders will require global synchronization in order to ensure that they're accessing the same gamestate. All of those factors coupled together can easily undo any improvement from faster ram on some weird architecture.

    Problems in programming must be addressed with precision. There's no Magic Technology that will make things cool. You need to know what you're fixing, where, why and how. Address problems with precision. Throwing buzzwords at comptuer doesn't make software work faster.

    ----

    Now please explain to me why the hell you don't already know all this.
     
  35. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Did you notice that the article is a few years old, and the technology is an adaptation and opening up of optical bus technology to more companies.

    It's not wrong it's an intellectual exercise based on a hypothetical 1Ghz CPU with 100ns memory access times and a 1ns program operation time.

    In reality space invaders ran at 30 fps on 2Mhz 8080 cpu with about 55 aliens.

    It just shows that higher bandwidth processing can be achieved in less time with a more parallel, closer to the data hardware design.

    Really, you're still focusing on gamestate in parallel programming? You do realise that you are worried about a state that can be flagged in a single op call in 1ns on a 1Ghz CPU in a game that takes 16,666 ns between frames @60fps.

    The concept is that with near to memory processing the bandwidth needed is reduced. Optical interconnects have massive bandwidth potential compared to copper and as more processing will be done closer to the data you need less bandwidth.

    This is not just my opinion if you look there is a trend within the IT industry where research and development into faster processing is pushing processors and memory closer together and for the need for higher bandwidth connectors.

    Unity's job system, multithreading, multi-core CPU's, massively parallel GPUs, HBM/HBM2+ memory, Optical connectors why the hell don't you get this?
     
  36. QFSW

    QFSW

    Joined:
    Mar 24, 2015
    Posts:
    2,905
    Wait so lemme get this straight
    The issue is no longer needing 1000s of CPU cores
    or double precision 64 bit memory
    or not having VR everything
    But RAM being too slow?
     
    Ryiah likes this.
  37. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    No just too far away from the CPU's GPU's so the time it takes to get the data to be accessed is order of magnitudes greater than the time it takes to process the data.

    Check your CPU's specs, how many levels of cache does it have?
     
  38. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,364
    You're probably right. That's kind of how I believe the brain works. Each neuron is like a "smart" piece of memory. It probably holds the following things:
    • Which slot(s) of short term memory do I belong to at present (if any)?
    • If certain input signals what is a good output signal?
    • Am I currently the focus of the persons attention?
    • Am I a desirable neuron to be activated or not?
    • A memory of the activation of itself in a certain time period? (like its rhythm)
    So for example if you are thinking of the sentence "The cat chased the dog".

    The 'cat' neuron might store the fact that it is second in the short term memory. This is the opposite of how computers work in which the there would be a central unity which stores a references to each of the words in the sentence. Instead the memory itself holds where it is in the sentence.

    Each neuron is like a Pavlov's dog. It can learn certain stimulus and responses which are just signals from other neurons. Maybe happiness is a way of rewarding the neurons just like you reward dogs to train them! For example imagine a neuron that receives a signal from a neuron activated by the 'm' sound followed by a signal from a neuron activated by the 'a' sound. This neuron would correspond to the 'ma' sound.

    Imagine instead of arrays of words, you instead had a dictionary and next to each word you put the number of where that word appears (if at all) in your array.

    That is indeed an inside out computer and I believe it is how the short term memory of the brain works!

    So not so crazy after all. :cool:
     
    Last edited: Nov 25, 2016
  39. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,026
    On the topic of specifications there is one aspect that I feel completely throws off your argument. If desktop processors are truly held back so significantly by memory bandwidth limitations then why do they only ship with two memory channels while server processors and motherboards for the same form factor support more.

    Just as an example the AMD Zen processor that both of us are hyped over has up to eight memory channels for server.
     
  40. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    Won't the server version of Zen have 32 cores.

    If CPU's were not limited by memory access times why would they need cache.

    Modern CPU's have about three levels of cache and HBM or XPoint memory is really a fourth level off die cache.

    Also compare the price of server level motherboards, memory and CPU's to their desktop equivalent.

    Did you hear about HP's "the Machine" redesign of computing

    > http://www.theregister.co.uk/2016/11/24/hpes_machinations_to_rewrite_server_design_laws/

    The problem with HP's the machine is that is still divided processing from memory, it expected photonic buses to deliver an order of magnitude improvement in bandwidth and data access performance.

    Smart RAM or HBM2+ is where you just locate the processors next to the data and cut the distance between the two. Then you move the code around and not the data.

    Ideally with really Smart RAM the smaller element would always be moved e.g. DATA or LOGIC.

    Even with a massively parallel system you would want a LOGIC / DATA balance. After all LOGIC is just data when it moves over a system bus between processors.
     
    Last edited: Nov 25, 2016
  41. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    20,026
    Because under the hood a modern processor is more akin to an entire system than a single component. Instructions fed to the processor are not directly handled by the hardware but are simply interpreted by microcode programs running on the internals of the processor. The caches in a processor are effectively the "main memory" for these microcode programs.

    Modern CPUs also have very narrow buses. A single memory channel is only 64-bits wide and thus a dual memory channel system is only bringing in data up to 128-bits at a time. If memory were truly a limiting factor we would see wider buses. Yet the widest available right now is only quad channel or 256-bit and only for server hardware.
     
    Kiwasi and passerbycmc like this.
  42. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    It is anti-intellectual exercise. One of the main rules of programming is "profile before optimizing". If the exercise goes against the rule, it is not a good exercise.

    Yes. Because the state is here. All invaders must be eventually be synchronized (via a "wall") every fram. Also, it is not exactly the matter of a single op. Parallel computation works better when there's no need for synchronization.

    It is your opinion, because the "trend" is perceived by you.

    Because equivalent of your argument is "quality of building is determined by color of its bricks and nothing else, hence to make better buildings everybody should start using purple bricks today! All hail purple bricks!".

    All of this - trends, memory, etc, is essentially irrelevant. Imagining "endless possibilities" (which is what you love doing) is essentially a waste of time, because the world "of possibilities" doesn't exist. You have a project you want to implement. You work on THAT project. You identify EXACT issues and EXACT bottlenecks in THAT specific projects and solve them. Without getting distracted by trends, buzzwords, etc. Tech that is "being developed" is not currently available, does not exist (for practical purposes) and relying on it would be a risk.

    And honestly before you hit memory bandwidth, you'll run into many other issues (money, manpower, etc). That's the main problem. "imagine what we could do with faster/smarter/whatever memory!" Nope. First you need to have a project that NEEDS that tech. If you don't have a project that is bottlenecked at this, then talking about it is quite pointless. I mean, you're using invaders as an example.
     
    Ryiah and passerbycmc like this.
  43. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    You don't even seem to grasp that this is not about programming it's about hardware design.

    Really so big IT companies like Intel are spending millions in R&D on a "trend" I made up.

    Check out the Intel Knights Landing-> https://www.nextplatform.com/2016/06/20/intel-knights-landing-yields-big-bang-buck-jump/

    72 cpu cores on a single chip combined with 16 GB of on package RAM.

    This is the high end of what Intel do there supercomputer chip and look at the benefits they claim for this new product:


    5 x performance
    8 x perf/watt
    9 x perf/USD$

    They are even calling the up to 384 GB of DDR4 memory you can use with this system far memory.

    Bringing memory closer to the CPU or taking the CPU to the memory module can have massive benefits in bandwidth, speed and reduced power consumption.

    So go on tell Intel that their Knights Landing R&D team was just an Arowx perceived trend, as I would love to get a percentage on what they make from using my "perceived trend"

    While you're at it you could Contact Nvidia or AMD about their next gen CPU/GPUs with HBM2.

    This is about a fundamental change in how hardware systems are being put together. A change that boosts performance and could lead to more modular systems.
     
    Last edited: Nov 25, 2016
  44. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    If you have built a PC you come to realise that the memory / motherboard and CPU often all have to be changed together.

    The power supply, SD/HD and GPU are more modular and easier to change.

    With an optical bus enabled motherboard and modular CPU/HBM2 blocks we could have much easier to update multi-CPU systems in the future.

    Who knows with the right form factor even mobile devices and game consoles could become more modular and upgradable.

    How much higher in resolution can mobile displays go before we can't tell the difference anyway.
     
  45. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    Hardware is designed to tackle specific problems. If there's no problem to tackle, no reason to design hardware. If there's no problem that can utilize hardware, then buying it will be a waste of money.

    For a mere 6 thousands usd.

    You won't be writing games for this piece of hardware. At least games that are supposed to have an audience. (there's slight chance of getting a once in lifetime contract that involve writing a military training sim or something)

    There are things that needs to be done, and then there's "infinite realm of the possibilities".

    When you work on some sort of project, "infinite realm of possibilities" is a source of distractions that needs to be exterminated, and the best idea is to concentrate at very specific issues you need to deal with.

    I have impression that you think that the next buzzword you discover will bring some sort of miracle that will revolutionize everything. In my experience, that is not how it works. There are no miracles. Instead there will be a gradual change over course of many years, that may eventually result in something.

    There are a lot of things that "have potential" or fall into "think about all the possibilities!" category. Often those things result in nothing interesting. For example, we had No Man's Sky with infinite possibilities. Early access games on steam often shows potential and then fail to fully realize that potential (I'm an owner of Star Forge Alpha, and Planet Explorers. Star Forge is a complete failure, Planet Explorers failed to turn into what it could've been). Youtube is also full of "amazing possibilities" thing that often do not turn into a mainstream product. For example, there's this:


    Cool, right? Except that nothing similar is being used in any game I know. Because... tool chains, development times, limitations, etc, etc.

    The reason why I don't share your excitement is because I look for practical application and don't believe into miraculous buzzwords. If technology is "in development", then it does not exist yet. Even if it was in development for many years. If something has "possibilities!", then I'll believe into those possibilities when they happen. Instead of waiting and hoping for a next miracle. This kind of thing.
     
  46. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    I've built my PC many times and can tell that this is not the case. All upgrades were sequential. Occasionally motherboard dies and you have to replace it, while keeping previous memory sticks and the cpu. Occasionally you need CPU upgrade in which case you'll stick new CPU into old socket. Occasionally you need to upgrade the whole thing at once, BUT this is a fairly rare event.

    Putting the whole thing onto a single chip would mean that any time you need an ugprade, you'll have to throw out pretty much the whole computer and will have no choice on that matter.

    Of course, if you have a lot of money to burn, you could do that too.

    However, the point of having a PC was modularity and user's ability to fine-tune the system to their needs. Marrying CPU with RAM sticks means throwing this modularity away, plus increased expenses in case of inevitable hardware failure.
     
  47. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    No that's a misconception, as like Knight Landing with it's 16 GB of on module RAM or HBM2 you can also have DRAM or distant ram on the motherboard.

    You don't complain about losing VRAM when you change your GPU do you?

    Interesting issue would it be better if GPU's had VRAM modules so you could upgrade your GPU (if sockets matched) or boost or re-use it's VRAM?
     
  48. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    In this case you'll have two different types of memory that probably will need to be dealt with in two different ways.

    GPU VRam upgrade theoretically can be done (requires one heck of a skill, because RAM chips are normally soldered on by robots). Modular GPUs probably won't happen, because this would require universal GPU architectuere of sorts, and GPU manufacturers compete with each other. (see how Modular Smartphone idea went). Unified memory architecture with fast ram accessible at the same speed by both GPU and CPU (like on consoles) would be a much more useful idea than trying to make RAM "Smart". I think AMD/ATI was working in this direction.
     
  49. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    How do you program for L1, L2 or L3 ram caches on your current CPU as this would be a L4 cache?
     
  50. neginfinity

    neginfinity

    Joined:
    Jan 27, 2013
    Posts:
    13,301
    ... which means the system would require specialized tools in order to properly utilize it. And probably a couple of OS patches as well.

    No, thanks, I'll pass. Make it mainstream first, then it'll be reasonable to consider something like this.