Hi, I'm working on a Unity Editor extension for Unity 2018.4 (I'm expecting it to work on newer versions too), and I need to do some computationally intensive work from the GPU. I already got a simple prototype working with a Compute Shader, but it turns out that the only way to run a Compute Shader is by freezing the main thread. I even tried to dispatch the Compute Shader from an IJob, but the only thing that did was enable the Editor to catch the exception without me wrapping the code in my own try-catch. I asked the folks over on the GPU Progressive Lightmapper thread how they managed to run code on the GPU without freezing the main thread and they said that this was done by using OpenCL from a background thread. Does anyone have any recent advise on how to run OpenCL from Unity; please keep in mind that I've never used OpenCL before. Most of what I found was either people talking about error codes from the GPU Progressive Lightmapper or a few threads from 2012. Unity has changed a lot these last four years, so I'm not very hopeful when it comes to those older threads. Can anyone point me to some resources on running OpenCL from a background thread? Thanks, -Francisco
Hi Francisco! I am interested in this too. I'm curious as to how you got on with this, did you get openCL to work in Unity? I found this project which might help https://github.com/leith-bartrich/openclnet_unity cheers, Chris
Well, I never did bother to get OpenCL to work from it, but I did figure out how to start an outside process from it. From there, I added UDP communication between the apps. I used flatbuffers for the message encoding. After that, I looked into how to create a headless(?) Unity game: a Unity game that doesn't have a screen. This means that I had a plugin for the Unity Editor that would start a new process that it would offload its work to. That new process would be a headless Unity based app that would run data through a Compute Shader and then return the results via UDP to the Unity Editor that started said process. This allowed me to use Unity's ComputeShaders asynchronously from the Unity Editor, stopping the Editor from freezing when a Compute Shader took more than a few frames to run. So, instead of trying to get OpenCL to run in a background thread from the Editor, I just ran a separate program, which is apparently how the Unity team that created the Progressive Lightmapper does things.
OpenCL in Unity: Minimal Working Example. In Start(), we initialize OpenCL and compile kernel; In Update(), we pass to kernel number of current frame. Kernel multiplies it by current thread index. In this way, then we receive back three values: zero, number of current frame, and number of current frame multiplied by two; Reference: http://memeplex.blog.shinobi.jp/opencl/ Full Source Code (OpenCL.cs): Code (CSharp): using System.Collections; using System.Collections.Generic; using UnityEngine; using System; using System.Runtime.InteropServices; using System.Linq; public class OpenCL : MonoBehaviour { static string ComputeKernel = @" __kernel void myKernelFunction(__global float* items, __const int number) { unsigned int id = get_global_id(0); items[id] = number * id; } "; Kernel _Kernel; CommandQueue _CommandQueue; Buffer _Buffer; void Start() { IntPtr device = getDevices(getPlatforms()[0], DeviceType.Default)[0]; Context context = new Context(device); _CommandQueue = new CommandQueue(context, device); Program program = new Program(context, ComputeKernel); program.Build(device); _Kernel = new Kernel(program, "myKernelFunction"); _Buffer = Buffer.FromCopiedHostMemory(context, new float[3]); _Kernel.SetArgument(0, _Buffer); } void Update() { _Kernel.SetArgument(1, Time.frameCount); _CommandQueue.EnqueueRange(_Kernel, new MultiDimension(3), new MultiDimension(1)); float[] readBack = new float[3]; _CommandQueue.ReadBuffer(_Buffer, readBack); string result = ""; foreach (var number in readBack) { result = result + number.ToString("N0") + "; "; } Debug.Log(result); } IntPtr[] getDevices(IntPtr platform, DeviceType deviceType) { int deviceCount; OpenCLFunctions.clGetDeviceIDs(platform, deviceType, 0, null, out deviceCount); IntPtr[] result = new IntPtr[deviceCount]; OpenCLFunctions.clGetDeviceIDs(platform, deviceType, deviceCount, result, out deviceCount); return result; } IntPtr[] getPlatforms() { int platformCount; OpenCLFunctions.clGetPlatformIDs(0, null, out platformCount); IntPtr[] result = new IntPtr[platformCount]; OpenCLFunctions.clGetPlatformIDs(platformCount, result, out platformCount); return result; } } class Context { public IntPtr InternalPointer { get; private set; } public Context(params IntPtr[] devices) { int error; InternalPointer = OpenCLFunctions.clCreateContext(null,devices.Length,devices,null,IntPtr.Zero,out error); } ~Context() { OpenCLFunctions.clReleaseContext(InternalPointer); } } class CommandQueue { public IntPtr InternalPointer { get; private set; } public CommandQueue(Context context, IntPtr device) { int error; InternalPointer = OpenCLFunctions.clCreateCommandQueue(context.InternalPointer,device,0,out error); } ~CommandQueue() { OpenCLFunctions.clReleaseCommandQueue(InternalPointer); } public void ReadBuffer<T>(Buffer buffer, T[] systemBuffer) where T : struct { GCHandle handle = GCHandle.Alloc(systemBuffer, GCHandleType.Pinned); OpenCLFunctions.clEnqueueReadBuffer( InternalPointer, buffer.InternalPointer, true, 0, Math.Min(buffer.SizeInBytes, Marshal.SizeOf(typeof(T)) * systemBuffer.Length), handle.AddrOfPinnedObject(), 0, IntPtr.Zero, IntPtr.Zero ); handle.Free(); } public void EnqueueRange(Kernel kernel, MultiDimension globalWorkSize, MultiDimension localWorkSize) { MultiDimension offset = new MultiDimension(); OpenCLFunctions.clEnqueueNDRangeKernel( InternalPointer, kernel.InternalPointer, globalWorkSize.Dimension, ref offset, ref globalWorkSize, ref localWorkSize, 0, null, IntPtr.Zero ); } } class Buffer { public IntPtr InternalPointer { get; private set; } public int SizeInBytes { get; private set; } private Buffer() { } ~Buffer() { OpenCLFunctions.clReleaseMemObject(InternalPointer); } public static Buffer FromCopiedHostMemory<T>(Context context, T[] initialData) where T : struct { Buffer result = new Buffer(); result.SizeInBytes = Marshal.SizeOf(typeof(T)) * initialData.Length; int errorCode; GCHandle handle = GCHandle.Alloc(initialData, GCHandleType.Pinned); result.InternalPointer = OpenCLFunctions.clCreateBuffer( context.InternalPointer, MemoryFlags.CopyHostMemory, result.SizeInBytes, handle.AddrOfPinnedObject(), out errorCode ); handle.Free(); return result; } } class Program { public IntPtr InternalPointer { get; private set; } public Program(Context context, params string[] sources) { int errorCode; InternalPointer = OpenCLFunctions.clCreateProgramWithSource( context.InternalPointer, sources.Length, sources, null, out errorCode ); } ~Program() { OpenCLFunctions.clReleaseProgram(InternalPointer); } public void Build(params IntPtr[] devices) { int error = OpenCLFunctions.clBuildProgram( InternalPointer, devices.Length, devices, null, null, IntPtr.Zero ); if (error != 0) { int paramValueSize = 0; OpenCLFunctions.clGetProgramBuildInfo( InternalPointer, devices.First(), ProgramBuildInfoString.Log, 0, null, out paramValueSize ); System.Text.StringBuilder text = new System.Text.StringBuilder(paramValueSize); OpenCLFunctions.clGetProgramBuildInfo( InternalPointer, devices.First(), ProgramBuildInfoString.Log, paramValueSize, text, out paramValueSize); throw new Exception(text.ToString()); } } } class Kernel { public IntPtr InternalPointer { get; private set; } public Kernel(Program program, string functionName) { int errorCode; InternalPointer = OpenCLFunctions.clCreateKernel( program.InternalPointer, functionName, out errorCode ); } ~Kernel() { OpenCLFunctions.clReleaseKernel(InternalPointer); } public void SetArgument(int argumentIndex, Buffer buffer) { IntPtr bufferPointer = buffer.InternalPointer; OpenCLFunctions.clSetKernelArg( InternalPointer, argumentIndex, Marshal.SizeOf(typeof(IntPtr)), ref bufferPointer ); } public void SetArgument<T>(int argumentIndex, T value)where T : struct { GCHandle handle = GCHandle.Alloc(value, GCHandleType.Pinned); OpenCLFunctions.clSetKernelArg( InternalPointer, argumentIndex, Marshal.SizeOf(typeof(T)), handle.AddrOfPinnedObject() ); handle.Free(); } } static class OpenCLFunctions { [DllImport("OpenCL.dll")] public static extern int clGetPlatformIDs(int entryCount, IntPtr[] platforms, out int platformCount); [DllImport("OpenCL.dll")] public static extern int clGetDeviceIDs(IntPtr platform, DeviceType deviceType, int entryCount, IntPtr[] devices, out int deviceCount); [DllImport("OpenCL.dll")] public static extern IntPtr clCreateContext(IntPtr[] properties, int deviceCount, IntPtr[] devices, NotifyContextCreated pfnNotify,IntPtr userData,out int errorCode); [DllImport("OpenCL.dll")] public static extern int clReleaseContext(IntPtr context); [DllImport("OpenCL.dll")] public static extern IntPtr clCreateCommandQueue(IntPtr context, IntPtr device, long properties, out int errorCodeReturn); [DllImport("OpenCL.dll")] public static extern int clReleaseCommandQueue(IntPtr commandQueue); [DllImport("OpenCL.dll")] public static extern IntPtr clCreateBuffer(IntPtr context,MemoryFlags allocationAndUsage,int sizeInBytes,IntPtr hostPtr,out int errorCodeReturn); [DllImport("OpenCL.dll")] public static extern int clReleaseMemObject(IntPtr memoryObject); [DllImport("OpenCL.dll")] public static extern int clEnqueueReadBuffer(IntPtr commandQueue,IntPtr buffer,bool isBlocking,int offset,int sizeInBytes,IntPtr result,int numberOfEventsInWaitList,IntPtr eventWaitList,IntPtr eventObjectOut); [DllImport("OpenCL.dll")] public static extern IntPtr clCreateProgramWithSource(IntPtr context,int count,string[] programSources, int[] sourceLengths, out int errorCode); [DllImport("OpenCL.dll")] public static extern int clBuildProgram(IntPtr program,int deviceCount, IntPtr[] deviceList,string buildOptions,NotifyProgramBuilt notify,IntPtr userData); [DllImport("OpenCL.dll")] public static extern int clReleaseProgram(IntPtr program); [DllImport("OpenCL.dll")] public static extern IntPtr clCreateKernel(IntPtr kernel, string functionName, out int errorCode); [DllImport("OpenCL.dll")] public static extern int clReleaseKernel(IntPtr kernel); [DllImport("OpenCL.dll")] public static extern int clSetKernelArg(IntPtr kernel, int argumentIndex, int size, ref IntPtr value); [DllImport("OpenCL.dll")] public static extern int clSetKernelArg(IntPtr kernel, int argumentIndex, int size, IntPtr value); [DllImport("OpenCL.dll")] public static extern int clEnqueueNDRangeKernel(IntPtr commandQueue, IntPtr kernel,int workDimension,ref MultiDimension globalWorkOffset, ref MultiDimension globalWorkSize,ref MultiDimension localWorkSize,int countOfEventsInWaitList,IntPtr[] eventList,IntPtr eventObject); [DllImport("OpenCL.dll")] public static extern int clGetProgramBuildInfo(IntPtr program, IntPtr device, ProgramBuildInfoString paramName,int paramValueSize,System.Text.StringBuilder paramValue,out int paramValueSizeReturn); } delegate void NotifyContextCreated(string errorInfo, IntPtr privateInfoSize, int cb, IntPtr userData); delegate void NotifyProgramBuilt(IntPtr program, IntPtr userData); enum DeviceType : long { Default = (1 << 0), Cpu = (1 << 1), Gpu = (1 << 2), Accelerator = (1 << 3), All = 0xFFFFFFFF } enum MemoryFlags : long { ReadWrite = (1 << 0), WriteOnly = (1 << 1), ReadOnly = (1 << 2), UseHostMemory = (1 << 3), HostAccessible = (1 << 4), CopyHostMemory = (1 << 5) } struct MultiDimension { public int X; public int Y; public int Z; public int Dimension; public MultiDimension(int x) { X = x; Y = 0; Z = 0; Dimension = 1; } } enum ProgramBuildInfoString { Options = 0x1182, Log = 0x1183 }