Search Unity

[Idea] Unity with C# to GPU power!

Discussion in 'General Discussion' started by Arowx, Jan 7, 2015.

  1. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    I actually haven't tackled this issue yet. Check out the OpenCL.Net Unity repository, I think they made an API for GL sharing and maybe one for DX9/DX11 (not sure):
    https://github.com/leith-bartrich/openclnet_unity

    In any case, you can take a look at which native functions they're calling to get that done.
     
  2. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Didn't knew they hold that api:O Good to know, thanks. They don't support DX for now but I'm either be able to implement that based on the OGL one or use the copying over host methode.
     
  3. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    What about those functions in your Bindings? There are alot of functions (originally defined in CL10 the same as in OpenCL.Net, altgough marked in CL12 as deprecated...) which are used for "converting" all kinds of GL stuff over to CL! Don't they work correctly?
     
  4. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
    They should, but I haven't tested them :p I haven't needed to share buffers yet. I recommend staying away from the deprecated ones though.
     
    Seneral likes this.
  5. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    Lol found that even Cloo-Unity provided functions for CL/GL Interop for Windows (atleast);) Cool, that will decimate my work to do.

    My Plan is to (hopefully) support also Mac/Linux and eventually DirectX...

    EDIT: Wouldn't trust the deprecated note cause even CreateImage2D is marked as such and the internal library still uses it as well as OpenCL.Net Unity.
     
    Last edited: May 6, 2015
  6. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    @Thinksquirrel Any idea how I could retrieve the OpenGL (and later D3D) Context Unity is using? There are functions to get the current context on the current thread. But I guess I would have to execute that on unity's graphics thread which is as inaccessible as it's context xD
     
  7. lilymontoute

    lilymontoute

    Joined:
    Feb 8, 2011
    Posts:
    1,181
  8. Seneral

    Seneral

    Joined:
    Jun 2, 2014
    Posts:
    1,206
    I'm aware of the texture pointer, but what I need is a cl context shared with opengl. So basically the already existing unity graphics context where the RenderTexture is stored on.
    The only hope I see atm to get the context is to get c# (not native) code executed on the render thread using GL.IssuePluginEvent ;) Don't know how to approach that however...

    EDIT: Bad news. According to this thread I really would need to do a Native Code Plugin just because UT had the horrible idea to limit lowlevel render stuff to those!! (if it's still up to date) Means in order to do a solid interop api we would really have to do that. Please correct me if I'm wrong. I'll just put the necessary code to get the context there.
     
    Last edited: May 7, 2015
  9. elmar1028

    elmar1028

    Joined:
    Nov 21, 2013
    Posts:
    2,359

    I know that this post is over a year old...but here goes!

    True - 2D and 3D are handled the same way, but 2D uses less rendering power than 3D games because less triangles need to be rendered.
     
  10. zombiegorilla

    zombiegorilla

    Moderator

    Joined:
    May 8, 2012
    Posts:
    9,052
    Generally that is true, I'm sure there exceptions, but rare. Another thing is that generally 2d uses less complex shaders.
     
  11. specweapons

    specweapons

    Joined:
    Dec 23, 2016
    Posts:
    7
    Yes an API: Maybe C++ AMP, a heterogeneous (or Hardware Independent) super computer on every desk, in every living room in every pocket! The Main Stream! https://channel9.msdn.com/Events/AMD-Fusion-Developer-Summit/AMD-Fusion-Developer-Summit-11/KEYNOTE
    I am already using this in Unity to access GPU FFT!
     
  12. specweapons

    specweapons

    Joined:
    Dec 23, 2016
    Posts:
    7
    Actually it makes a lot of sense. Take a simple task like improving the speed of rendering line segments. The Unity3d Line Renderer is useless without tricks like this, like the code above, in Imbarns: kinect 2 projects, this kind of GPU processing, is mandatory, to save things as simple as LineRenderer. I had to write the simplest 2D like (UV pointing forward toward viewer) line renderer to get anything near reall time, and adjust the play windows to at least Fast, bypassing quality of Graphics. However, like the kinect 2 project mentioned above if my audio data was periodically thrown in memory every 1/60 of a second, I could recompute the new vertices, which since it is semi 2D would only change the Vector3.y value,for each Vector3 line point, and it could compute them all in parallel, or at once, one clock cycle.
    Odd Vector3.y vertice are some kernal/function line_point.y-(line width/2), and even ling_point.y+(line width/2). Of course since we now do it in GPU without having to call a method load stack pointers, and heap's and crap of overhead stuff like that we could make it more than a simple line, and engineer fancy lines, with fancy line functions, and line joints, etc.

    That's not true. Most all the (Fastest) Super computers are now using Multiple GPU Cards.
    NVIDIA GPUs Now Power World’s 10 Greenest Supercomputers


    Universities better be teaching Parallelism now instead of procedural, function, or concurrency.
    http://software.mindsight.co Advanced Scientific Computing
    I just think it needs to be heterogeneous, (hardware independent) seemingly almost invisible to the programmer, like CPP AMP, Accelerated Massive Parallelism. I am sure we all know 15 boring professors who will say, the world is still flat.
    AMD Fusion Developer Summit 11 AFDS Keynote: Herb Sutter - Heterogeneous Computing and C++ AMP

    The Advancing March of Parallelism
    I am using this by writing very short GPU C++ AMP dll's wrapping it with C#, and then providing it to my Unity3d Script to process data.

    Howard H. Aike, As quoted in Portraits in Silicon (1987) by Robert Slater
    Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats.
     

    Attached Files:

    Last edited: Apr 8, 2017
  13. specweapons

    specweapons

    Joined:
    Dec 23, 2016
    Posts:
    7
    Nice I will have to see if I can use this to speed up my Oscilloscope Tool.
     
  14. ChazBass

    ChazBass

    Joined:
    Jul 14, 2013
    Posts:
    153
    This is the answer right here, in my opinion.

    In my day job, I happen to work in an industry where we run models which involve parallel processing on massive grids. It works great for specific uses. However, it's easy for people to start thinking this is a good solution for everything even problems that are not inherently parallelizable. The really hard part is actually in setting up the processing--especially marshalling data to the right nodes at the right time, doing the processing and then reassembling the results, all in a manner that makes the overall system both compliant and performant.

    So, back to games, even if you can convince me that the GPU doesn't already have its hands full, where exactly would you be able to take advantage of this so that it results in a net benefit to the overall system? Sure, some problems like pathfinding could benefit, but that is a lot of information to marshal back and forth so I wonder if you wouldn't just be better off not doing that and doing instead with threads on the CPU.
     
    Dustin-Horne and Kiwasi like this.
  15. Arowx

    Arowx

    Joined:
    Nov 12, 2009
    Posts:
    8,194
    So when your CPU is spending it's 16ms working on what to draw next, doing the physics, player inputs, game logic, engine stuff what exactly is your GPU doing apart from quickly drawing the last frame then mostly waiting for the next frame?

    Also how much of your GPU is being used to draw frames, are you using all the memory/bandwidth on your GPU for the full 16ms inter frame time?

    The biggest problem with processing on the GPU is probably the bandwidth between the CPU and GPU and their respective RAM access.

    Isn't that what the too many draw calls and need for Vulkan / DX12 was all about, we have these powerful GPU's that were not able to show off what they could do because of CPU / GPU bandwidth limitations.
     
  16. sngdan

    sngdan

    Joined:
    Feb 7, 2014
    Posts:
    1,154
  17. Dustin-Horne

    Dustin-Horne

    Joined:
    Apr 4, 2013
    Posts:
    4,568
    Exactly what was being referred to by marshalling of data. If it take 16ms to process but is a lot of data (such as nav mesh / node states) and takes 14ms to send the data and receive the result, it's not a gain, unless the gpu can do the calculations in under 2ms. Now those are just made up numbers but the point still stands. GPUs also support a much smaller set of data types, so you may expend extra processing power just in transforming the data into something usable.
     
  18. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    ZJP and Flurgle like this.
  19. Flurgle

    Flurgle

    Joined:
    May 16, 2016
    Posts:
    389
    @Tugrul_512bit That looks amazing. Do you have any examples unity projects using it?
     
  20. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    @Flurgle thank you, I once wrote a sphere-type particle exclusion solver(long ago), such that, 64k spheres did not blend eachother, collided. Like a big atom core, with neutrons and protons. But I don't have it right now, but I can write a new one and re-optimize other parts(which I forgot, to decrease draw calls) and put to youtube if you want.

    Even tried the cluster part of it but Unity somehow stops LAN with TCP-IP or I'm noob at that :(
     
  21. Flurgle

    Flurgle

    Joined:
    May 16, 2016
    Posts:
    389
    @Tugrul_512bit That sounds amazing. You'd do a great service if you had that, and maybe a github. Much appreciated. :)
     
    Tugrul_512bit likes this.
  22. ZJP

    ZJP

    Joined:
    Jan 22, 2010
    Posts:
    2,649
    Thanks for this. A lot... :cool:
     
    Tugrul_512bit likes this.
  23. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    I couldn't find the solver codes but I'm writing a very simple nbody interaction now which uses x,y,z,vx,vy,vz,fx,fy,fz arrays and computes brute-force 16k particles in 16 milliseconds. When its done, I'll make a video using OBS studio and upload to youtube. Why 16k particles instead of 64k? Because I had a hd7870 before. Now I have only a R7-240 which is 1/5 of that card in terms of compute power. Maybe even less.

    While I write that, you may spend some time with this:


    this is also from ages ago with hd7870 but not unity, jmonkey engine :D
     
    Flurgle likes this.
  24. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    Here is a very unoptimized version.

    I tried multithread transform for 16k cubes, Unity won't let me do it.

    I'm also lazy(and have so much housework to do) so didn't use mesh deformer + point draw mode, just instantiated cubes.

    On top of these, screen recorder is using GPU (and its just 320 cores with 25GB/s memory) which increases average kernel time from 17ms to 22ms.

    When particles start moving, batching operation becomes slower too! Maybe overlapped cubes make it somewhat slower for renderer too.

    But it works at least.

     
    Flurgle likes this.
  25. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    Nearly 300-400 GFLOPs (I don't know that how many flops a sqrt counts) and cards theoretical limit is 592 GFLOPs at 925MHz.
     
    Flurgle likes this.
  26. Flurgle

    Flurgle

    Joined:
    May 16, 2016
    Posts:
    389
    @Tugrul_512bit this stuff is fascinating, keep it up (do you have a blog?).
     
    Tugrul_512bit likes this.
  27. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    Flurgle likes this.
  28. ippdev

    ippdev

    Joined:
    Feb 7, 2010
    Posts:
    3,853
    Love it. @Tugrul-512bit comes in and slays the luddite dragon squad
     
    Tugrul_512bit and Flurgle like this.
  29. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    I don't know if project can be compiled for linux. I also didn't try C# mono. But Unity is like mono? Maybe the only obstacle is System.Threading.dll? Simply using a "using" can make it not-needed in mono and make the projectt Linux-able?

    Learned object oriented things newly because I'm a physic engineer and concentrated on solving physics fast, more of a C language than C# but object oriented programming is very entertaining. I mean the depths of it. When doing more than just inheriting a base class.

    Why didn't unity accept cudafy? It already has physx doesn't it?
     
    Last edited: Apr 16, 2017
  30. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    You can offload some of the wave equation calculation to GPU:

    next version of the api will have array-of-struct support so we won't need to copy to/from primitive arrays.(should make it 10 times faster at least)

     
    Last edited: Apr 16, 2017
    Flurgle likes this.
  31. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    A Parallel.For would be nice. It is in newer versions of Mono. I've written pure Mono applications with parallel.for and had 2-3 times speed up.

    However, I don't think this will ever happen in Unity as they are still on old versions of Mono.
     
  32. ZJP

    ZJP

    Joined:
    Jan 22, 2010
    Posts:
    2,649

    Yes please, put the Unity test on your git. :cool:
     
    Last edited: Apr 25, 2017
  33. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Well, to be fair, only certain types of subroutine would run quicker on a GPU. Those that require millions of similar calculations. So just putting [GPU] [/GPU] round certain things wouldn't necessarily make it faster. But if there was a way to write ComputeShader purely in C sharp and that would work on non-DirectX 11 hardware, then that would make everyone happy. So why not? In fact I would like it if ALL shader code could be written in C sharp. I hate shader code too! :( Such a pain in the butt.

    So why are there ComputeShaders? Heh.:p
     
  34. Ryiah

    Ryiah

    Joined:
    Oct 11, 2012
    Posts:
    21,175
  35. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    @ZJP I was away from computer but now I'm adding user defined struct array support then it will compute directly on Vector3 arrays without needing to unbox+copy_to_primitive_array+copy_from_primitive_array+box so when v1.1.9 is finished, I'll put unity project with mesh deformated surface example with it.
     
    ZJP and Flurgle like this.
  36. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Good news indeed. :)

    It works!!

    Code (csharp):
    1.  
    2. using UnityEngine;
    3. using System.Threading;
    4. using System.Threading.Tasks;
    5.  
    6. public class Test : MonoBehaviour {
    7.     void Start() {
    8.         Parallel.For(0,100, x =>{
    9.             Debug.Log("Number = "+x+"\n");
    10.         });
    11.     }
    12. }
    13.  
    The intellisense in Mono is not recognising Parallel but it compiles. Not sure if it really is running on multi CPUs. It seems to be splitting it over 4 CPUs which is good. Hope so, because then I can speed up some calculations. Nice. About time too! This is worth getting Unity 2017 alone. Could speed up procedural calculations by 4 times or even 8 times for someone with an 8-core laptop!

    This is just the right timing for the procedural game I'm making.

    (Psst. But lets keep this a secret I don't want other games to be as fast as mine!)

    update:

    OK, so the Intellisense works but I had to set .Net version to 4.5.1 in Mono because it says .Net 4.6 is not installed. Perhaps they forgot to include 4.6 in the latest Mac beta build?
     
    Last edited: Apr 16, 2017
    Ryiah and Tugrul_512bit like this.
  37. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Here's a benchmark test you can try for anyone who hasn't used Parallel.For before:
    Code (csharp):
    1.  
    2. using UnityEngine;
    3. using System.Threading.Tasks;
    4.  
    5. public class Test : MonoBehaviour {
    6.     public bool doParallel=false;
    7.     int[] x=new int[10000];
    8.  
    9.     public void Update(){
    10.  
    11.         if(doParallel){
    12.  
    13.             Parallel.For(0,1024, n =>{
    14.                 for(int y=0;y<10000;y++){
    15.                     x[n] = n;
    16.                 }
    17.             });
    18.         }else{
    19.             for (int n=0;n<1024;n++){
    20.                 for(int y=0;y<10000;y++){
    21.                     x[n] = n;
    22.                 }
    23.             }
    24.         }
    25.  
    26.     }
    27. }
    28.  
    Just put that on an empty game object. (Unity 2017 beta, .Net 4.6 enabled). I get on my MacBook Air:

    doParallel unchecked:
    CPU 95ms, 11 FPS

    doParallel checked:
    CPU 37ms, 26.6 FPS

    A massive improvement! You need something substantial in between the Parallel For to make it worthwhile.

    Here is a nicer benchmark. Superfast fractal:
    Code (csharp):
    1.  
    2. #define USE_PARALLEL
    3. using System;
    4. using UnityEngine;
    5. using UnityEngine.UI;
    6. #if USE_PARALLEL
    7. using System.Threading;
    8. using System.Threading.Tasks;
    9. #endif
    10.  
    11. //--------------------------------Benchmark------------------------------//
    12. //  Place this script on a RawImage centered in the middle of the screen //
    13. //  Comment out the very first line to try it without parrallism.        //
    14. //  Requires Unity 2017 beta. Set to dot Net 4.6 or 4.5.1                 //
    15. //-----------------------------------------------------------------------//
    16.  
    17. class Test: MonoBehaviour
    18. {
    19.     RawImage image;
    20.  
    21.     static int WIDTH = 512;
    22.     static int HEIGHT = 512;
    23.     static int MAX_ITER =  50;
    24.  
    25.     static byte[] data;
    26.  
    27.     void Start()
    28.     {
    29.         image = GetComponent<RawImage>();
    30.         texture = new Texture2D(WIDTH,HEIGHT,TextureFormat.ARGB32,false);
    31.         data = texture.GetRawTextureData();
    32.     }
    33.  
    34.     Texture2D texture;
    35.  
    36.     void Update(){
    37.         int x,y;
    38.  
    39.         x=(int)Input.mousePosition.x - Screen.width/2;
    40.         y=(int)Input.mousePosition.y - Screen.height/2;
    41.  
    42.         float X = x*1.0f/WIDTH;
    43.         float Y = y*1.0f/HEIGHT;
    44.  
    45.         size *=0.99f;
    46.         float dx=0.05f;
    47.         midX = midX + X*size*dx;
    48.         midY = midY + Y*size*dx;
    49.  
    50.         Mandelbrot();
    51.  
    52.         texture.LoadRawTextureData(data);
    53.         image.texture = texture;
    54.         texture.Apply();
    55.  
    56.     }
    57.      
    58.     static int frames=0;
    59.  
    60.     static double midX=-0.5f;
    61.     static double midY=0;
    62.     static double size = 1.5f;
    63.  
    64.  
    65.     static void Mandelbrot(){
    66.         frames++;
    67.  
    68.         double minX=midX-size;
    69.         double maxX=midX+size;
    70.         double minY=((midY-size)*HEIGHT)/WIDTH;
    71.         double maxY=((midY+size)*HEIGHT)/WIDTH;
    72.  
    73.         int width = WIDTH;
    74.         int height = HEIGHT;
    75.         int R = ((WIDTH*4+3)/4)*4;
    76.  
    77.         double cx = (maxX-minX)/width;
    78.         double cy = (maxY-minY)/height;
    79.  
    80.  
    81. #if USE_PARALLEL
    82.         Parallel.For (0,height, y=>
    83. #else
    84.         for(int y=0;y<height;y++)  
    85. #endif
    86.                 {
    87.                     for(int x=0;x<width;x++){
    88.                         double r0 = x*cx+minX;
    89.                         double i0 = y*cy+minY;
    90.                         double r = r0;
    91.                         double i = i0;
    92.                         int n=0;
    93.                         double r2 = r*r;
    94.                         double i2 = i*i;
    95.                         while( r2 + i2 < 4 && n<MAX_ITER){
    96.                             i = 2 * r * i + i0;
    97.                             r = r2 - i2 + r0;
    98.                             r2 = r * r;
    99.                             i2 = i * i;
    100.                             n++;
    101.                         }
    102.                         n=(n==MAX_ITER?0:n+frames);
    103.                         data[y*R+x*4+1] = (byte)(n*5);
    104.                         data[y*R+x*4+2]=(byte)(n*7);
    105.                         data[y*R+x*4+3]=(byte)(n*11);
    106.                     }
    107.                 }
    108.         #if USE_PARALLEL
    109.             );
    110.         #endif
    111.  
    112.     }
    113.  
    114. }
    115.  
    116.  
    117.  

    I get parallel = 25-30FPS. Not parallel = 10-15FPS. (Yes, it probably could be done quicker with shaders or a c++ library but that's not the point!)
     
    Last edited: Apr 16, 2017
    Tugrul_512bit, ZJP and Flurgle like this.
  38. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    APPROACHING THE SOUND BARRIER!

    I have a bad news. Even though the unity project has only simple things, it is 433 MB, also I couldn't find exact filter to apply for github for this, but I'll give it some time tomorrow.


    There is also a good news.

    Here is the video that has 3x the performance of CPU solution.




    For now, I'm putting here only script file which is just creating a sphere and deforming it but it needs a very simple prefab to get some coordinates, you can directly cancel that and put your camera's front coordinates. Then it should work.

    here is the script that was attached to camera:

    https://github.com/tugrul512bit/Cekirdekler/blob/master/Kamera.cs

    don't forget to put binaries near Unity Editor exe file if you run this in editor. Also don't forget to add binaries as assets too. Then you should be able to reference Cekirdekler.dll and system.threading.dll(if you need it)
     
  39. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Can you make it any faster using "Parallel.For" or "unsafe" code? How did you get it to use 8 cpu cores? Is that just by using Unity 2017? How about precomputing an array of square root, sine and cosine values (say every 1 degree) so they can be looked up quicker?
     
  40. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    I'm using Unity 5.3.4f1 and it uses all cores by itself, rendering, updating, some other things. I'm not using multiple threads but API has 2 workers in Parallel.For and they do very lightwork. Most of the CPU usage is background usage by Unity, video recording, some other things too.

    Look-up table is faster when RAM is faster than CPU but nowadays CPUs are order of magnitude faster than RAM. For GPU, it is even further faster.

    Also lookup tables destroy caching if they are big.

    For GPU, a single core can calculate 3-5 trigonometric functions before data comes from memory.

    Actually I forgot to enable pipelining on Vector3 -_- let me benchmark it
     
    Flurgle likes this.
  41. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    Hmm.. depends how sin and cosine are calculated. They usually take several steps compared with say a simple multiplication. In general I've found lookups faster for these things. But maybe depends on your CPU.

    So did you have to specially set up Unity 5.3 to use the latest dot Net?

    Good job. :)
     
  42. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    Thank you, I didn't set any project properties because it doesn't open for Unity projects for some reason.

    Benchmark result:

    mesh deformation still memory bound, removed GPU, just used CPU, enabled 4-blob driver-based pipelining

    speedup=3.3x

    so it is %10 faster without GPU. (memory can't keep up with all devices, kernel has too few compute per byte)

    Now I'll benchmark with partial reads, without reading whole arrays.
     
  43. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    Fastest is CPU-only GPGPU, because it is memory bound operation. Thats why CPU usage was not %100.

    But need to enable device partitioning to choose only 7 cores because last core is needed for API internal controls.
     
  44. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    Parallel.For makes it 7.2 ms. Not bad. Then the only different thing between API and Parallel.For is, vectorization(SIMD vs scalar).

    Also vectorization codes can be enabled by adding something like openmp or similar. But that would need some hints for compiler or doing very simple computations to let it vectorize. Opencl automagically does that.

    5.5 vs 7.2 ---> %30 difference only

    for an O(N^²)-complexity high compute-to-data ratio algorithm, CPU is like %3 versus GPU, even pulls GPU down like a zombie. Devices should be between at least 0.1 to 10.0 performance of each other to be helpful.
     
    Last edited: Apr 16, 2017
    yoonitee likes this.
  45. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    So you're saying OpenCL is faster than just doing Parallel.For loops? Do you need .net 4.5 to use OpenCL or does it work with .net 2.0? I will have a go with OpenCL to see what happens Not exactly sure how it would work unless its built on dot net functions that mono can run. I'm working on a Mac too. I've seen this version maybe it will work.
     
    Last edited: Apr 17, 2017
  46. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    project uses dictionary list, some generic class implementations and parallel.for. Nothing so much complicated. The other dll uses C++ already.

    OpenCL compiler efficiency changes vedor to vendor. Intel can do better. You can test.

    OpenCL needs only a C++ dll. That is used with dllimport from C#. Dllexport from C++.
    I've run in .Net 2.0 once with the system.threading.dll and mostly .Net 3.5 after that, jus kept the system.threading for safety. Using binaries in .Net 4.6, no problem.
     
  47. ZJP

    ZJP

    Joined:
    Jan 22, 2010
    Posts:
    2,649
    Remove the Library folder.
     
    Tugrul_512bit and Flurgle like this.
  48. yoonitee

    yoonitee

    Joined:
    Jun 27, 2013
    Posts:
    2,363
    I just realised if Unity 2017 supports dot net 4.6 it will also support the "async" keyword and the "Task" api. That will also be really useful to do things in parallel or run things in the background. Good stuff.

    I think someone needs to write a tutorial about parallelism for Unity 2017 which explains all about the system.threading.tasks library and how it can be used.
     
    Tugrul_512bit and Flurgle like this.
  49. Tugrul_512bit

    Tugrul_512bit

    Joined:
    Apr 9, 2016
    Posts:
    46
    https://github.com/tugrul512bit/unityTestMeshDeformation

    now it creates sphere infront of camera so doesn't need any prefab

    don't forget to put binaries near Unity editor too. Also make project 64 bit.

    Also I forgot to tell that API uses win32kernel dll access for copying buffers somewhere but only 1 or 2 times. Later I will make it non-windows so linux will be more workable.

    Scene bb is the latest scene I was working.

    The github addon was not working for the project for some reason, I s,imply uploaded whole folder after deleting library and some other unnecessary subfolders
     
    Flurgle likes this.
  50. ZJP

    ZJP

    Joined:
    Jan 22, 2010
    Posts:
    2,649
    Sorry for the late response. Thanks for the link. :cool: