runtime ai scripting in burst

nijnstein · Aug 19, 2021

hi, anyone any experience with a light weight scripting engine that can be used from burst code with only minimal symantics? Im currently developing something that compiles a c-like script into a struct with a fixedbyte[512] for the code segment (8 bit words) and a fixedfloat[16] for the data stack. (enough for what i want for now) with a simple interpreter im able to use it in burst jobs and while i havent tested any performance it feels very fast.. currrently i have support for scripts with simple arithmetic and im adding support from the functions from unity.mathematics but i guess after that ill be adding the next thing...

it feels like reinventing the wheel. what do you guys use in burstable code? everything i seem to find looks way to heavy and impossible or not performant enough to use from burst?

Ashkan_gc · Aug 19, 2021

The data structures required for script execution are usually full of pointers and ASTs and string manipulation, parsing and ... you will probably will not benefit too much from coding the interpreter in burst, however it can create a set of assembly like instructions to be executed by a burst runtime.

nijnstein · Aug 19, 2021

i compile the script into bytecode much like assembly, the burst interpreter for that bytecode is very simple. no pointers, no strings and no variable length structures in the native side of things:

Code (CSharp):

struct BurstScriptExecutable

{

/// <summary>

/// unique script id

/// </summary>

public int Id;

/// <summary>

/// byte code compiled from script

/// </summary>

public fixed byte code[512];

/// <summary>

/// size of code in bytes

/// </summary>

public int code_size;

/// <summary>

/// index into bytecode, next point of execution, if == code_size then end of script is reached

/// </summary>

public int code_pointer;

/// <summary>

/// index into dataregister, starts from data[start_stack_index]

/// </summary>

public int stack_pointer;

/// <summary>

/// input, output and scratch data fields

/// </summary>

public fixed float data[32];

/// <summary>

/// nr of data registers

/// </summary>

public int data_size;

}

performance is already as i want it in burst jobs, its only that it feels like im making stuff already made.. if not ill probably put in on github or something when ready.. but id rather use a proven solution.

Using the burst as a jit like how it would be possible with c# to compile in runtime into il seems not possible for now, id really like to be able to do that. I could outout IL instead of bytecode but could i have the burst compiler compile it in runtime and give me back a function pointer?? no idea

Ashkan_gc · Aug 19, 2021

It is not likely that people built scripting runtime in burst or at least built and shared it online too. Less than 100 real companies probably are using it directly at this moment and most probably only for calculation heavy stuff.

nijnstein · Aug 21, 2021

got it working and its fast, 10 million times running a script is so fast i had to check my timins tripple and again:

my results for the following test:

#bscript test
a = 1 * 2; # a = 2
b = -1 * 2; # b = -2
c = a / b; # c = -1
d = c * 100; # d = -100
e = c + 1 * 100; # e = 0 (no muplication rules yet - should be simple node-reorder)
f = c + (1 * 100); # f = 99

This is compiled in into 44 bytes of bytecode and a stack of 9 floats and put into the struct in the previous post, some prelimenary results in the editor: (i9900k, gtx1080)

(note that on the 2nd run compilation is largely skipped)

im suprised that even with a clumsy interpretor i get better performance then i wanted i guess my cpu likes that everythin fits in a single cache line

next ill be adding some control flows, vectors (float3) and functions and it should be usable.

anybody an idea how to have burst compile a jump table from a switch statement on a byte enum? i guess that theres some codestyle to adhere to for ncpp to convert it in a jumptable.. it wont for now.. its the biggest slowdown in the interpreteror and using arrays of function pointers.. i dont know if and optimizer will benefit from that

10.000.000 executions of this in less then 3 ms in editor, i wasnt expecting that even for such simple script

edit: results in runtime are even more dramatic, 0.1ms in burst over 16 cores, for 10M executions of the above script:

nijnstein · Aug 21, 2021

i think i will have it support something like this, it is based on a tokenizer but there is no grammer (intentionally)

Code (CSharp):

#

# burst script test 6

#

# - no declaring variables, this is not needed

# - NO SCOPE

# - no strings

# - no allocations

a = 1 * 2; # a = 2

b = -1 * 2; # b = -2

c = a / b; # c = -1

d = c * 100; # d = -100

e = c + 1 * 100; # e = 0 (todo - reorder nodes in parsetree)

f = c + (1 * 100); # f = 99

#

# functions

#

g = abs(c) * f + (max(a, b, c, d) * 4)

#

# conditions

#

w = (a > b) & (c < d) ^ (e > f); # note single char operators, executed left to right

w = !w;

#

# control flow

#

if(b > 0)

f = 16;

else

f = 4;

if(f = 4) {

h = 50 - min(1, 2, a, b, c, d, e, f) * 2;

i = 1;

}

else

{

h = 1

i = 1

}

while( a > b ) # with support for break and continue

{

b = b * 2;

a = b;

}

#

# jumps

#

@label1:

goto label1; # i like goto

jump_z(a, label1); #jump if a = 0 to label1

jump_gz(a, label1); # jump if a greater > 0 to label1

jump_nz(a, label1); # jump if a not zero to label1

jump_lz(a, label1); # jump if a smaller < 0 to label1

#

# vectors

#

j = [1, 2, 3];

k = [3, 2, 1];

m = j * k;

n = [1, 2, 3] * 4;

#

# vector ops

#

o = normalize(n);

p = min(n); # should return smallest element

q = min(n, m); # should return smallest vector n or m

r = min_arg(n, m); # should return smallest element of each vector as a [,,]

#

# functions

#

function add1(a) { return a + 2; }

function add2(a) { a = a + add(2); a = a + add(1); return a; }

s = add2(1))a

#

# signals

#

signal_a = 80808080;

set(signal_a, 1);

t = wait_for_signal(signal_a); # will yield until signal > 0

#

# yield

#

yield; # return back execution untill next frame

yield a; # yield for n frames

return; # return indefenately

#

# stack stuff (more the 32 (config) push will result in allocations by the interpretor

#

pop a

push a

peek a

#

# constants

#

u = deltatime;

v = elapsedtime;

external functions can be added as compiled fragments of bytecode for the interpretor based on the code the compiler finds, or externally from some library of scripts while some will be built in (math functions etc), float3 and quaternions will be compatible with vectors of size 3 resp. 4

what do other people think i really miss? scoping inside a fragment like this is probably something i dont want/need/see much use for it as functions will be executed in a different context but hey maybe other poeple really want it.. there are some performance issues implementing that while it ass little for simple scripts

strings and allocations wont be supported

Ashkan_gc · Aug 21, 2021

Depends on users of your scripts but probably having jumps and goto and jze and ... is not a good idea if those who use this are designers/high level programmers
if you are compiling a higher level language to this byte-cocde then my point is mute of course.

nijnstein · Aug 22, 2021

during startup it compiles in managed code something like:"

Code (CSharp):

f = 1 + 2 * 2 + 5;

into:

Code (CSharp):

001 128 017 130 129 002 131 000

readable:

Code (CSharp):

assign <f> fma #2 #2 #1 + #5 nop

it correctly re-ordered for multiplication while analysing the parsetree, then the optimizer used pattern recognition to replace 'a + b * c' (in any order) into fma b c a , this results in a 8 bytes of bytecode and only 3 switches because it can 'prefetch' parameters for an operation like fma (from 5 to 1 switch). this is getting fun i could also void the whole call because the parameters are constant but i dont want to make that optimization yet, ill save that for last.

guess ill add if/then for and while loops as nobody i asked knew what jz ment...

burst blasts through the bytecode this and attaching the resulting scripts with ibufferelementdata entities gives me a fast and clean solution for having millions of entities with unique behaviour (if i could only render that on my 1080)

i have a hard time myself believing my own benchmarks, note that this is for 10M executions of the script in burst.

nijnstein · Aug 23, 2021

burst really likes scripting like this, the cache would probably glow red hot on running it 10m times... now i need to find out a nice method to get the data from IComponentData... i could point the stack and map fields without allocations or copies but that seems nasty..

functions, controlflow, i dont really need it right now but they will follow

Code (CSharp):

Reading scriptfile: ai/scripts/test5

CODE:

a = 1 * -2; # -2

b = -1 * 2; # -2

c = a / b; # 1

d = c * 100; # 100

e = 1 + 2 * 2; # 5

f = 1 + 2 * 2 + 5; # 10

g = 1 + 2 / 6; # 1.3333

h = 1 + -2 / 6; # 0.6666

i = 1 - 2 * 2 + 5; # 2

j = 1 + 2 * 2 - 5; # 0

k = 1 +-2 * 2 + 5; # 2

COMPILE: 13,0137 ms

CODE SEGMENT: 83 bytes

001 128 129 005 003 130 000 001 131 003 129 005 130 000 001 132 128

004 131 000 001 133 132 005 134 000 001 135 017 130 130 129 000

001 136 017 130 130 129 002 137 000 001 138 017 130 144 129 000

001 140 018 130 144 129 000 001 141 018 130 130 129 002 137 000

001 142 017 130 130 129 003 137 000 001 143 018 130 130 129 002

137 000

ASSEMBLY:

set <a> #1 * - #2 nop set <b> - #1 * #2 nop set <c> <a> / <b> nop set <d> <c> * #100 nop set <e> fma #2 #2 #1 nop set <f> fma #2 #2 #1 + #5 nop set <g> fma #2 #0,166666672 #1 nop set <h> fsm #2 #0,166666672 #1 nop set <i> fsm #2 #2 #1 + #5 nop set <j> fma #2 #2 #1 - #5 nop set <k> fsm #2 #2 #1 + #5 nop

DATA SEGMENT: 17 floats

var 1 name = a value = -2 refcount = 2

var 2 constant value = 1 refcount = 18

var 3 constant value = 2 refcount = 28

var 4 name = b value = -2 refcount = 2

var 5 name = c value = 1 refcount = 2

var 6 name = d value = 100 refcount = 1

var 7 constant value = 100 refcount = 2

var 8 name = e value = 5 refcount = 1

var 9 name = f value = 10 refcount = 1

var 10 constant value = 5 refcount = 8

var 11 name = g value = 1,33333337 refcount = 1

var 12 constant value = 6 refcount = 1

var 13 name = h value = 0,6666666 refcount = 1

var 14 name = i value = 2 refcount = 1

var 15 name = j value = 0 refcount = 1

var 16 name = k value = 2 refcount = 1

var 17 constant value = 0,166666672 refcount = 4

BENCHMARK: running 10.000.000x

execution benchmark (single thread c#): 27.149 ms

execution benchmark (parallel burst): 0.1333 ms

the referencecounts for constants are bugged... they are mainly used to remove divisions and replace the initial value with its inverse

nijnstein · Aug 28, 2021

as expected im getting carried away a bit with stuff, this simple script:

Code (CSharp):

input a 0 4;

output a 0 4;

a = a + 1 * (1 / 2) * 2;

will convert into the following 11 bytes of bytecode:

Code (CSharp):

001 128 128 002 031 004 131 131 133 132 000

// set <a> <a> + mula / #1 #1 #0,5 #2 nop

through several steps, the input code is parsed into a token tree, then an analyser gets to check and optimize nodes according to some rules after which the compiler will output bytecode. The bytecode is then further optimized using pattern recognition. In this snippet the following things happen after parsing the token tree:

1. analyser ensures order of operation

Code (CSharp):

a = a + (a * (1 / 2) * 2)

2. the division is converted into a multiplication with its inverse as the division was determined to be with a constant

Code (CSharp):

a = a + (a * (1 * 0.5) * 2)

note that the analyser will use reference counting to either replace the original value in the stack or to create a new constant value on the interpretor stack.
3. the analyser removes unneeded parenthesis

Code (CSharp):

a = a + (a * 1 * 0.5 * 2)

4. the optimizer kicks in and sees multiple multiplications and an addition and replaces it

Code (CSharp):

a = a + (mula(a, 1, 0.5, 2))

5. finally, the parenthesis are not needed anymore as now the multiplications are 1 function and operator precedence is not a problem anymore

Code (CSharp):

a = a + mula(a, 1, 0.5, 2)

the interpretor can now very efficiently produce an execution path in runtime/burst from the 9 bytes of code the script resulted in

Code (CSharp):

128 128 002 031 004 131 131 133 132

The assingment statement 'set' and all 'nop's are not actually needed to correcly execute any code the interpretor outputs.

This all works better then expected and theres a lot more then this... i keep finding methods to improve the execution of the script by the interpretor.

The input/output at the start of the script defines the mapping for io vars, currently it directly maps with pointers
but it should become optional to copy or direct reference, bytesize is used not to determine the parameter types as those are inferred from their usage, it is used to determine if all parameters can be set on starting the script using a single native memset as they would be located in one block

Code (CSharp):

input <id> [offset] [bytesize]; // map identifier to memory offset from the data used to run the script on

output <id> [offset] [bytesize] // map identifier back

Next up is control flows and vector types.

Some results:

The following code compiles into 59 bytes and a stacksize of 13 floats (this could be a lot smaller but it is not optimized yet), i takes 0.33ms on average to compile in single threaded c#

Code (CSharp):

a = a + 1 * (1 / 2) * 2;

b = 2 * (2 * 2 * 2) * 2;

c = 2 - 2 - 2 - 2 - 2 - 2;

x = 1 & -2 | 1;

y = 1 & 2;

z = 1 & 2 & 3 & 4 & 5;

and results in the following benchmark:

1st run

2nd run

that is 10M times executed in 0.014s

just maybe ill get scripting paralel in burst just as fast or faster as doing it direcly in c# without script or burst, thats just awesome. A really large optimization is that burst makes a jump table of all switches as in c# i wont get it compile like that.

- note that for testing it evaluates expresions leading to non-referenced values aka useless calculations are performed, this is for evaluation purposes only, in release you would never calculate anything as not a single value is returned via output.

nijnstein · Aug 28, 2021

- should be 10m times in 0.016 ms, i keep getting a spam error on editing the post

- the second run has more stuff in close in memory, it shows how with ecs patterns it becomes much easier to have efficient memory acces by default

nijnstein · Aug 30, 2021

its comming together, vectors are in:

Code (CSharp):

# define a float4

f = (1 2 3 4);

# perform scalar * vector * scalar operation

g = 2 * f * 2;

COMPILE: 0,4084 ms

CODE SEGMENT: 15 bytes

001 128 132 133 134 135 000 001 136 133 005 128 005 133 000

ASSEMBLY:

set f 1 2 3 4 nop set g 2 * f * 2 nop

DATA SEGMENT: 12 floats

var 1 name = f [4] value = 1 2 3 4 refcount = 3

var 2 constant value = 1 refcount = 2

var 3 constant value = 2 refcount = 6

var 4 constant value = 3 refcount = 2

var 5 constant value = 4 refcount = 2

var 6 name = g [4] value = 4 8 12 16 refcount = 1

A vector is deduced from the data following an operation.

Most trouble was with the optimizer to check data sizes of variables while matching on bytecode patterns, the interpretor has very little knowledge of it other then it builds from the byte and a float[] with data. each operation > 128 is actually the offset (-128) into that table, maybe its best to have code and data share the same fixedbyte[] (with data using memorya backward offsetted from end index) as for the same sized icomponentdata the range of supported scripts can be larger or the number of parameters/stack vars can be larger for the same sized data block;

Im not sure if it should be just 1 IScriptData with a preset size defined per project, or if i should have different IComponentDatas with different sizes.. for me 192 (i think i can get away with 64 bytes total for most usages for me, remember, unique script/entity) byte of shared code/data memory would probably be more then enough. Chunks like the smaller ones but i really want to test soon how much structure size influances performance.

next up:
- constant vectors, could ditch most of the scalars then from stack (only the '2' remains)
- functions, math utilities mosly, anything to get smaller code for repeating tasks would be nice too.. option to map a few byte opcodes to native function pointers for flexibility or is that way too much..
- control flow, ifthen, while etc.
- small constant values should use non-used operand and varid opcode space, these would then not needed to be fetched but it would only work for small integers and booleans; the interpretor should be able to figure it out by using 1 byte as index for the id operation (128 now)
- booleans, its very tempting to use 1 64 bit datatype and intrinsics to manipulate it and make 64 bools available to the script for the cost of 4 bytes of memory, boolean operations work but they use a float as backing field..
- lot of testing.
- it wont work on mobiles as i used sse intrinsics in the interpretor and plan to use it a lot more for vector math, i dont know if a neon code path would be nice or just a safe version for burst to figure out... too much questions time to sleep and work tomorrow

nijnstein · Aug 30, 2021

i guess my biggest question is how to organize this nicely for reuse, when it has a data stack and it shares chunk memory space with your entities then it seems very wastefull to have duplicates of data being copied to and from the script, pointing to everything would force unsafe code (it is now but should not be later), having properies on each script means probably generating code to compile with your project for easy data access if the only place it is stored is in the script's stack..

nijnstein · Sep 5, 2021

small update

Code (CSharp):

[B]CODE:[/B]

aa = 1 + 1 * (1 / 2) * 2;

ab = 2 * (2 * 2 * 2) * 2;

ac = 2 - 2 - 2 - 2 - 2 - 2;

ax = 1 & -2 | 1;

az = 1 & 2 & 3 & 4 & 5;

aw = 1 * 10 * 3 * (3 + 4 * 2);

[B]CODE SEGMENT: 72 bytes[/B]

024 018 037 004 085 085 107 086 019 001 128 085 002 025 000 001

129 037 005 086 086 086 086 086 000 001 130 039 086 086 086 086

086 086 000 000 001 131 085 006 003 086 007 085 000 001 132 040

005 085 086 087 088 133 000 024 018 033 088 086 087 019 001 134

037 003 085 090 087 005 025 000

[B]ASSEMBLY:[/B]

push ( mula / 1 1 0,5002 ) set aa 1 + pop nop set ab mula * 2 2 2 2 2 nop set ac suba 2 2 2 2 2 2 nop nop set ax 1 &- 2 |1 nop set az all * 1 2 3 4 5 nop push ( fma 4 2 3 ) set aw mula - 1 10 3 * pop nop

[B]DATA SEGMENT: 7 floats, 28 bytes[/B]

var 1 name = aa value = 2 refcount = 1

var 2 name = ab value = 32 refcount = 1

var 3 name = ac value = -8 refcount = 1

var 4 name = ax value = 1 refcount = 1

var 5 name = az value = 1 refcount = 1

var 6 constant value = 5 refcount = 1

var 7 name = aw value = 330 refcount = 1

all nesting is converted into linear execution using a stack, this wil void all recursive paths in the interpretor so there is no cost in maintaining function stacks by the os for deep recursive calls, there is some extra stack use but that is lighning fast compared to a function call nesting yet another compound iteration. From now on the interpretor follows a completely flat execution path.

constant values are reference counted and the most common are encoded in opcodes, saving data space while costing no additional code space, a future update will allow an extended instruction to load constant vectors from the interpretor, i bet that by analysing all scripts in a project the compiler can make a very neat list of 255 vectors[n] that are used most often and make sence as a constant encoded via an opcode; this will save a lot of memory in the data segment

nijnstein · Sep 14, 2021

Hard too see in bytecode but this is the result for a nested If then / while and a switch statement, flattened and with understanding of vectors of max size 4 x 63:

Code (CSharp):

set aa 2 nop push ( aa - 1 ) jz @jmp_0 25 ( aa & pop nop ) ( set aa 42 nop set ab mula yield 42 1 1 1 nop ) jump 6 @jmp_1 ( @label_0 set aa 2 nop ) @label_1 set ab 1 nop push @label_3 ( 2 - 10 ) jz 23 @jmp_2 ( aa > pop nop ) ( set aa aa - 1 nop set ab ab + aa nop ) jump_back 29 @jmp_3 jz @label_2 13 @jmp_4 ( ab >= 826 ) ( set ac 2020,2020 nop jump 22 ) @label_4 jz 14 @jmp_5 ( ab >= - 34 ) ( set ac 1010,1010 nop jump 6 ) @label_5 set ac - 100 nop

and its getting fast.. i cant run a benchmark (next update) at the moment as im completely refactoring all code (it became a bit messy) and i forgot some 'minor' thingy... indexers like . and [], after that its time to start thinking about packaging blobs and making it simple to use. We could seperate code data and stack and depending on the task at hand their final memory location might be very important for maximized performance so i really want to have decent control about layout and destinations in several configurations. You might want to run script code from a manager using only data from a chunk and a stack based on a local stackallock [], or you might want something completely different.

Then finally for now, the kicker: 2-n wide execution, with simple scripts and very, very many of them we are bound to be able to sort and select execution paths were only data changes, and we should not have to do that every frame akin to how the ecs systems organize themselves, we should be able to burst through that as long as scripts dont use conditional flow control. (and even then, on an unlikely jump that mismatches, we could just take out that 1 path and run it single. The interpretor is flat for ease of coding that, i dont see any problem not solvable but i doo see many fun ones

My purpose was to build somekind of tool to run many changing behaviours as fast as possible with the behaviour determined elsewere but i guess it will still be a while.. there are many topics on maintaining data structures for changing behaviour and they are all complicated because they want AND performance AND flexibility AND are prone to change during development AND with the rigid setups needed to run fast code though c# can be extremely time consuming and not very rewarding given the time spent on it. With a very fast scripting language i hope to solve that problem for my company as i can have just some instuctions at each node based on some higher level representation elsewere. When code is deciding it can always use the higher level repros but on stepping each frame many small scripts in chunks seem a clean solution somewhere midway, with the benefit you can completely pull out and replace its content without code changes. it all becomes data seperate from other processes. We will probably release a compiler/interpretor package to the store in a while for the price of a hamburger or something.. we will also be bringing in a compiler specialist who will help improving the compiler before releasing it as we damn wel be sure its output is always correct and/or validated), that wont be free im afraid.

nijnstein · Sep 26, 2021

small update for anyone reading;

I started to make the parts that make it usable, its purpose is to on demand change functionality, either from input, ai or some server. While the interpretor got very fast, i will never be as fast as native code and one big drawback of using a script interpretor for core functionality is performance, most parts of the scripts will be static and running it with an interpretor would be a waste of cpu-cycles, hard coding it would mean making the code more complex if you want to keep supporting on demand scripts.

For this reason the scriptcompiler has some stages added that allow it to output c# burstcompatible code compiled from the scripts during design time. It will use functionpointers to call the native burst code if the script was known at compile time.

This has several nice advantage's: a) speed, it is much much faster, b) we can treat all scripts equal regardless what they are compiled into resulting in clean and compact code.
We can just use the same method everywere and still update parts during runtime without losing performance or getting complex code paths.

At some point a static repo is kept which is scanned with reflection for scripts:

Then we can use them everwhere from a manager:

Code (CSharp):

int res_hpc = mgr.Execute(script_id, data);

Performance is very enjoyable (note the warmup on the first run, each is 1000x, with time div 1000):

As expected, the interpretor is much slower, needing 0,141 ys for 1 iteration, while the native bursted version takes about 0,004ys . (EDIT: the benchmark is off.. il2cpp replaced the whole script with a packed move.. forgot it applies constant analysis and i dont for testing... time is higher but still very much faster, as would be expected) The interpretor will get faster by executing multiple equal scripts with different data wide but that will probably halt at 8x speedup and depening on the jumps in the script it might take some instruction padding (nops) to keep all in sync if even possible (many conditional jumps will be hard to solve and it probably will be useless so..), otherwise it will drop paths forcing them back in single execution.

It might seem wierd at first glance to first do all this trouble to end up with c# code again but this ensures we always have the same behaviour and most importantly: there is no drawback to using the script generally when only small parts are expected to change during runtime, and in a weekly application update important script might become hardcoded f who knows, in the meanwhile a lot of clients on an older code base while still executing the same functionality although with a (in real life) neglegible performance hit.

The script used in this test is very simple, 4 float multiplication statements, the compiler stage to c# isnt very finished as i wanted to see it work first :- also the handling of data could use some love, i guess we need to play a little with it in order to find out the nicest way to use it, ideally it would just read/write into a component data and just use a pointer to it but there are more usecases.. this needs some time toying with it.

Guedez · Sep 26, 2021

What is the data layout for the scripts like? Is it trivial to serialize and deserialize the scripts mid execution during runtime?

nijnstein · Sep 26, 2021

all data is put in a byte array as follows:

Code (CSharp):

/// [....code...|...variables&constants...|.datasizes.|...stack.....................]

the package when compiled into 1 combined segment looks like this, but you can have each segment seperate so if you want you can have your code in some global repo and the data tightly packed in componentdata (or the other way around), for saving space, the interpretor can use a stack local to its stack saving space as to fit more datasegments into 1 chunk. (you cant use yield in that situation because it uses the stack to preserve state) We want it to be as flexile as possible but at the same time we dont want 1 big mess so this is kinda open atm in how to do this cleanly for all users.

Code (CSharp):

public struct BlastPackageInfo

{

public BlastPackageMode package_mode;

public byte allocator;

public byte unused_1;

public byte unused_2;

public ushort code_size;

public ushort stack_size;

public ushort data_size;

public ushort package_size;

}

public struct BlastPackage96

{

public BlastPackageInfo info;

unsafe public fixed byte segment[96];

}

Guedez · Sep 27, 2021

I see, so it should be very easy to save it all, be it runtime code or not.
I wonder:

nijnstein said: ↑

For this reason the scriptcompiler has some stages added that allow it to output c# burstcompatible code compiled from the scripts during design time. It will use functionpointers to call the native burst code if the script was known at compile time.
Click to expand...

Now that you can import runtime burst assemblies, couldn't this be done in an unloaded assembly at runtime and then load it? I think it was introduced in 2021.1?

nijnstein · Sep 27, 2021

I saw there is support now for adding unloaded assemblies but i dont see a way to compile with burst at a client, we use csc.CompileAssemblyFromSource on windows (some testing path), could possibly use compilerservices in mono builds but it will be plain c#. we have codepaths that use the .net c# compiler to generate il in runtime for the scripts compiled ast instead of byte code but for now it will only function on windows and compile time is so slow that it gives management trouble we dont want now, in cases it takes seconds to compile a script in runtime and to justify that it would need to execute said script many 1000s of times. Its gonna be a tradeoff and at this point in the project we are looking how we can have a consistant interface for all of this.

to be honest we have assumed that using the burst compiler from runtime is impossible and havent even looked into it

update: if i only assign a constant in a test script the difference between interpretation and native is only about 15x. with wide execution we can probably get very close. it will be interesting to see what normal managed c# would do with the same script, can wide execution of small scripts be faster then managed c#? i think/hope so. but that test will take some time as it will need most components finished

nijnstein · Jan 15, 2022

after some sideroads its finally becomming a solid library, 1 feature left to complete and ill put it up in the asset store. I guess nobody will guess that feature no one did so far although its in plain sight

i guess the biggest time eater in this was attemting to make a fitting solution for using it in unity only to realise im not a game developer and i wont know how its used. So i went back to the core and will deliver on the bare metal works for others to make fitting solutions.

EDIT: well everbody who sees SSMD thinks its a spelling error for SIMD..

i tried to make that work but it wont easily. instead it assumes that in a massive simulation with millions of scripts running that some scripts, probably many are the same with different data. So SSMD stands for Single Script Multiple Data.

Its about 33x faster on a single thread without any further optimizations regarding vectorization nor burst branches make life a lot less perfect however so mitigating these for the most part and handling outliers is the last thing left to finish.

nijnstein · Jan 17, 2022

this is getting insane, with burst disabled im getting native speeds running things in large groups

1,2 ns for the expression a = b * c;

end up being memory bandwidth limited at only 4 threads..

With burst in unity :

nijnstein · Jan 25, 2022

inching towards a release into the asset store, i never though it would be so hard to say STOP that will be in version 2 but it has been done and somewhat feature limited version of BLAST will hit the asset store very soon, below is a demo of the scene that will be provided with the package, now lets hope someone other then me is interested in using this

Bivens32 · Jan 25, 2022

I am definitely interested. Although I don't have any time to mess around with it rightnow. I have a DOTS based visual scripting solution in the works and I feel my tool will need something like what you have created for it to have maximum flexibility. Maybe if I ever get mine working, I can try and combine the them.

nijnstein · Jan 25, 2022

Bivens32 said: ↑

I am definitely interested. Although I don't have any time to mess around with it rightnow. I have a DOTS based visual scripting solution in the works and I feel my tool will need something like what you have created for it to have maximum flexibility. Maybe if I ever get mine working, I can try and combine the them.
Click to expand...

Interesting, ive been thinking about visual scripting too, in my view it would generate the nodes that normally would be generated from the text in the solution and compile that into bytecode for executing it against a library of function pointers that make up the api.

Bivens32 · Jan 25, 2022

nijnstein said: ↑

Interesting, ive been thinking about visual scripting too, in my view it would generate the nodes that normally would be generated from the text in the solution and compile that into bytecode for executing it against a library of function pointers that make up the api.
Click to expand...

That sounds complicated to me, but I am sure you can do it. Haha. My solution is much higher lvl api wise. My tool is is meant to control when ecs systems turn on and off by providing a way to combine boolean conditions in any way you want. It's more of a control system than a granular visual programming system if that makes sense.

nijnstein · Jan 25, 2022

luckily it only sounds complicated, it should become simple to use blast, for example, if you can write a burstcompatible function to enable/disable systems and register it then it would be simple to accomplish:

in c#:

Code (CSharp):

Blast.Register(TurnOff, TurnOn); // psuedo code to register some function with the scripting api

Blast.Execute(control.bs, componentdata);

The componentdata maps to the datastack of the script, where control.bs would be script like:

Code (CSharp):

if(condition) then TurnOn(parameters); else TurnOff(parameters);

the only caveat is that my current Blast.Register procedure is far from easy atm. Thanks for your comment it got me thinking into how others might be using blast.

MicCode · Jan 31, 2022

This looks really interesting, probably will pick it up once it hit asset store.
Seems useful for game with some programming or for light modding.
How would it interact with the ecs system?

nijnstein · Jan 31, 2022

if you look at this script and then parameters deduced from it in the blue field, these align in memory and if used with ecs blast should be set to packagemode = ecs where it will use the icomponentdata associated with this script (in the system it takes a pointer to it).

The icomponent data would have to be handcrafted though to match the data (or just use fixed float arrays, and us e the blastpackage to translate data in/out.

Maintanance functions for entities are natively coded in the script api and can be used from script code as if they were math functions. These are to be implemented with the external function support in blast (and at the moment isnt ) and can be extended at design/runtime by users.

From there many different implementations can apply depending on whats wanted, im will be making some examples for it in the next few weeks before finally releasing, a statemachine in dots, maybe a particle system too as that would show its power very well and some more simple usecases for gameobjects as well.

MicCode · Jan 31, 2022

nijnstein said: ↑

im will be making some examples for it in the next few weeks before finally releasing, a statemachine in dots, maybe a particle system too as that would show its power very well and some more simple usecases for gameobjects as well.
Click to expand...

That sounds great, demo with use case will definitely make the tool easier to understand and integrate. Looking forward to your release.

davenirline · Feb 1, 2022

Is there a language reference?

nijnstein · Feb 1, 2022

i started writing one at https://github.com/nijnstein/BLAST-Documentation/blob/main/LanguageReference.md it isnt finished though, ill try to get it somewhat complete in the following days

davenirline · Feb 1, 2022

nijnstein said: ↑

i started writing one at https://github.com/nijnstein/BLAST-Documentation/blob/main/LanguageReference.md it isnt finished though, ill try to get it somewhat complete in the following days
Click to expand...

I don't see support for functions yet. If there is, this would be great!

nijnstein · Feb 1, 2022

"I don't see support for functions yet" some small section has been added but i will elaborate on how to extende the script api with user defined functions there.

nijnstein · Feb 1, 2022

In my (past) view those were always kindof predefined and have no room in script and initially i wanted to implement them exclusively in c# and register them as an external function and call them from script for performance.

That said; it would be nice to have so it will definitively be added after the first release.

nijnstein · Feb 1, 2022

davenirline said: ↑

I meant functions in the scripting language itself.
Click to expand...

you know this is why it takes so long for me to finally release i cant go to sleep tonight thinking about functions not being supported initially

nijnstein · Mar 22, 2022

well latest game releases didnt help with devtime.... at 2 MB of c# code im finishing of the documentation, lets hope unity accepts the blast package in its store

nijnstein · Mar 30, 2022

the first version has been accepted by unity its limited in scope but im sure its usefull in some cases

https://assetstore.unity.com/packages/slug/210942

nijnstein · Apr 14, 2022

version v1.0.4c finally succeeds in the plan originally put down: execute many scripts compiled at runtime in simd instructions, finally i get back to using dots again to put it to work where it belongs as its so much easier to have very many objects at once

The script: a = a * b; has the following assembly executed after some indexing magic, it returns after needing about 1.5ns per dataset in a 64 wide array.

Now the real pain starts.. making this easy to use for a wide range of things, the first person to use it used it to transfrom geolocation data from csv files and.. well i would have never guessed that i would really like some input on how people would use a scripting language in their project, especially one that could make their code faster.

I use it because i like to torment myself with a debugger (i really do), that voids me as a good use case..

sinaari · Aug 14, 2023

Hey, this is a very impressive project! How is it going?
I notice the Github project was not updated for a while, but on the other hand it seems to be pretty well set already.
What are your plans regarding BLAST further?
I'm still researching it, but it looks very interesting.

I did have an idea to create a DOTS-based script interpreter for a while, but I absolutely didn't anicipate that someone already tried it, and it turned out so great tbh.
I was thinking more about something like Lua or Scheme, for general scripting, mainly to avoid Unity compilation time delays and be able to work with DOTS. And of course such language would've been much slower than the succinct computation cores like you demonstrated with BLAST. But! I imagine that if such language was centered around BLAST, as an additional external layer maybe, then it could still have that power for the cases when you need it.

BTW, I came across BLAST absolutely by accident, searching for some assets that are not in any way related to something like it. Neither Google nor the Asset Store ever showed it to me even when I was directly searching for "Unity scripting", "Unity DOTS scripting", and so on quite a lot.

sinaari · Aug 22, 2023

I'll just leave a link here to a post about BLAST that has all the links to useful related pages, for quicker reference:
https://forum.unity.com/threads/blast-performance-runtime-scripting-in-unity-bursted.1319217/

Antypodish · Feb 7, 2024

Hey, it looks like great asset.
It works mostly on my end in Unity 2022.

Any approach to make it work with jobs?
It has managed data types.

Search Unity

Unity ID

Useful Searches

runtime ai scripting in burst

Attached Files: