[Help] Render Streaming AR

jwgsbbc · Aug 9, 2021

We're currently using the Render Streaming package in a prototype AR application, but have resorted to chroma-key to handle the transparency (so the user can see the camera feed behind the streamed render).

I am looking at solutions to allow for the alpha channel of the frame buffer to be transmitted, from a windows machine (with an NVIDIA card) to the client iPhone.

I can see that the WebRTC package currently uses the NVIDIA Codec v.9.1 which I don't think supports alpha encoding (whereas 11.1 does, it seems, via HVEC). I suspect this may be a blocker to using the hardware encoding of the NVIDIA card to encode translucent pixels.

Is there a solution that I'm missing?

Thanks.

kazuki_unity729 · Aug 18, 2021

jwgsbbc said: ↑

We're currently using the Render Streaming package in a prototype AR application, but have resorted to chroma-key to handle the transparency (so the user can see the camera feed behind the streamed render).

I am looking at solutions to allow for the alpha channel of the frame buffer to be transmitted, from a windows machine (with an NVIDIA card) to the client iPhone.

I can see that the WebRTC package currently uses the NVIDIA Codec v.9.1 which I don't think supports alpha encoding (whereas 11.1 does, it seems, via HVEC). I suspect this may be a blocker to using the hardware encoding of the NVIDIA card to encode translucent pixels.

Is there a solution that I'm missing?

Thanks.
Click to expand...

It is an interesting challenge for me.
My idea is the alpha channel texture combine into color texture before streaming it, and compositing received texture using shader.

This idea is based on the AR demo made by keijiro.

cloudending · Nov 12, 2021

Do you have workaround on this topic?

jwgsbbc · Nov 12, 2021

Our work around was as described by @kazuki_unity729, although I've only just seen their response today.

We split the rendered frame in two - left half for colour, right half for alpha stored as greyscale.

To allow for the existing render buffer to have its alpha channel stored in one half of the output a custom post-process was implemented.

https://docs.unity3d.com/Manual/PostProcessingOverview.html

The Unity tutorial on post processing for the HDRP (High Definition Render Pipeline) was followed, to get an example set up:

https://docs.unity3d.com/Packages/c...efinition@7.1/manual/Custom-Post-Process.html

By default the HD render pipeline uses a compressed 32-bit frame buffer for post processing, with 11 bits for both the red and green channels and 10 bits for the blue channel. Unfortunately it has no alpha channel. This meant that reading or writing to the alpha channel during the post-processing pass had no impact on the result.

The solution was to change the post-processing buffer format to an alpha supporting (but twice-the-size - 64bits) buffer format. Obviously this meant that the renderer uses more memory but that seems like a reasonable trade off.

See this forum thread for more details:

https://forum.unity.com/threads/alpha-in-render-texture-must-be-alpha.746405/

Once the alpha channel was accessible the existing greyscale post processing effect (from the Unity tutorial) was re-written to squeeze the colour into the left side of the image, and write the alpha into the right side. Below is the relevant fragment shader code:

Code (CSharp):

float4 CustomPostProcess(Varyings input) : SV_Target

{

UNITY_SETUP_STEREO_EYE_INDEX_POST_VERTEX(input);

float2 squishedTexC = input.texcoord * float2(2.0f, 1.0f);

uint2 positionSS = fmod(squishedTexC, float2(1.0f, 1.0f)) * _ScreenSize.xy;

float4 srcColour = LOAD_TEXTURE2D_X(_InputTexture, positionSS).rgba;

float4 colour = float4(srcColour.rgb, 1.0f);

float4 alpha = float4(srcColour.a, outColour.a, outColour.a, 1.0f);

// avoiding branching:

// h = 0 when x < 1 (colour half)

// and 1 when x >= 1 (alpha half)

float h = clamp((float)sign(squishedTexC.x - 0.9999f), 0.0f, 1.0f);

return lerp(colour, alpha, h);

}

And for decoding on the client side:

Code (CSharp):

fixed4 frag(v2f IN) : SV_Target

{

// read the colour from the left side

float2 colorTexCoord = IN.texcoord * float2(0.5f, 1.0f);

float3 feedColour = tex2D(_MainTex, colorTexCoord).rgb;

// and the alpha from the right side

float2 alphaTexCoord = colorTexCoord + float2(0.5f, 0.0f);

// Use the luminance of sample for the alpha since that should be the highest

// resolution assuming the video compression used chroma-sub-sampling

// e.g. YUV 4:2:2

float feedAlpha = Luminance(tex2D(_MainTex, alphaTexCoord).rgb);

// Since the buffer from the renderer contains blended colour

// 1-feedAlpha * c0 + feedAlpha * c1

// where we actually want c1 (blended using feedAlpha with

// the feed from the camera).

// If we presume that the transparent areas (c0) are BLACK then

// the colour from the feed will be feedAlpha * c1.

// so c1 = feedColour / feedAlpha.

float3 colour = feedColour/feedAlpha;

return float4(colour, feedAlpha);

}

A more efficient way of doing this could be to use a blend mode on the renderer that doesn't do any alpha blending, but this would have implications for the rendering of translucent objects overlapping other objects.

jwgsbbc · Nov 12, 2021

kazuki_unity729 said: ↑

It is an interesting challenge for me.
My idea is the alpha channel texture combine into color texture before streaming it, and compositing received texture using shader.

This idea is based on the AR demo made by keijiro.
Click to expand...

Sorry, I've only just seen your reply to my question, many thanks.

cloudending · Nov 17, 2021

jwgsbbc said: ↑

Our work around was as described by @kazuki_unity729, although I've only just seen their response today.

We split the rendered frame in two - left half for colour, right half for alpha stored as greyscale.

To allow for the existing render buffer to have its alpha channel stored in one half of the output a custom post-process was implemented.

https://docs.unity3d.com/Manual/PostProcessingOverview.html

The Unity tutorial on post processing for the HDRP (High Definition Render Pipeline) was followed, to get an example set up:

https://docs.unity3d.com/Packages/c...efinition@7.1/manual/Custom-Post-Process.html

By default the HD render pipeline uses a compressed 32-bit frame buffer for post processing, with 11 bits for both the red and green channels and 10 bits for the blue channel. Unfortunately it has no alpha channel. This meant that reading or writing to the alpha channel during the post-processing pass had no impact on the result.

The solution was to change the post-processing buffer format to an alpha supporting (but twice-the-size - 64bits) buffer format. Obviously this meant that the renderer uses more memory but that seems like a reasonable trade off.

See this forum thread for more details:

https://forum.unity.com/threads/alpha-in-render-texture-must-be-alpha.746405/

Once the alpha channel was accessible the existing greyscale post processing effect (from the Unity tutorial) was re-written to squeeze the colour into the left side of the image, and write the alpha into the right side. Below is the relevant fragment shader code:

Code (CSharp):

float4 CustomPostProcess(Varyings input) : SV_Target

{

UNITY_SETUP_STEREO_EYE_INDEX_POST_VERTEX(input);

float2 squishedTexC = input.texcoord * float2(2.0f, 1.0f);

uint2 positionSS = fmod(squishedTexC, float2(1.0f, 1.0f)) * _ScreenSize.xy;

float4 srcColour = LOAD_TEXTURE2D_X(_InputTexture, positionSS).rgba;

float4 colour = float4(srcColour.rgb, 1.0f);

float4 alpha = float4(srcColour.a, outColour.a, outColour.a, 1.0f);

// avoiding branching:

// h = 0 when x < 1 (colour half)

// and 1 when x >= 1 (alpha half)

float h = clamp((float)sign(squishedTexC.x - 0.9999f), 0.0f, 1.0f);

return lerp(colour, alpha, h);

}

And for decoding on the client side:

Code (CSharp):

fixed4 frag(v2f IN) : SV_Target

{

// read the colour from the left side

float2 colorTexCoord = IN.texcoord * float2(0.5f, 1.0f);

float3 feedColour = tex2D(_MainTex, colorTexCoord).rgb;

// and the alpha from the right side

float2 alphaTexCoord = colorTexCoord + float2(0.5f, 0.0f);

// Use the luminance of sample for the alpha since that should be the highest

// resolution assuming the video compression used chroma-sub-sampling

// e.g. YUV 4:2:2

float feedAlpha = Luminance(tex2D(_MainTex, alphaTexCoord).rgb);

// Since the buffer from the renderer contains blended colour

// 1-feedAlpha * c0 + feedAlpha * c1

// where we actually want c1 (blended using feedAlpha with

// the feed from the camera).

// If we presume that the transparent areas (c0) are BLACK then

// the colour from the feed will be feedAlpha * c1.

// so c1 = feedColour / feedAlpha.

float3 colour = feedColour/feedAlpha;

return float4(colour, feedAlpha);

}

A more efficient way of doing this could be to use a blend mode on the renderer that doesn't do any alpha blending, but this would have implications for the rendering of translucent objects overlapping other objects.
Click to expand...

thanks a lot. Any I have another question. How do you synchronize the camera image from client and render image from server.

jwgsbbc · Nov 17, 2021

Currently, we do not - latency affects the quality of experience, potentially drastically.
There are some latency mitigation strategies that we've briefly looked at - similar to those used in HMDs - Timewarp, Spacewarp, etc. Some of these require more info from the renderer though; depth, motion vectors.
A naive approach, if reducing latency isn't the priority is to delay the camera feed - this looks possible, but since there's no way currently to add metadata to the render stream (see: https://github.com/Unity-Technologies/com.unity.webrtc/issues/305) we're looking at encoding the sync time in the pixels. i.e. sending the pose time from the client to the renderer, encoding that in the rendered image, and decoding on the client. Then choosing the frame that from the camera that relates to that time. Currently, for some reason, it doesn't work though

cloudending · Nov 18, 2021

jwgsbbc said: ↑

Currently, we do not - latency affects the quality of experience, potentially drastically.
There are some latency mitigation strategies that we've briefly looked at - similar to those used in HMDs - Timewarp, Spacewarp, etc. Some of these require more info from the renderer though; depth, motion vectors.
A naive approach, if reducing latency isn't the priority is to delay the camera feed - this looks possible, but since there's no way currently to add metadata to the render stream (see: https://github.com/Unity-Technologies/com.unity.webrtc/issues/305) we're looking at encoding the sync time in the pixels. i.e. sending the pose time from the client to the renderer, encoding that in the rendered image, and decoding on the client. Then choosing the frame that from the camera that relates to that time. Currently, for some reason, it doesn't work though
Click to expand...

See https://groups.google.com/g/discuss-webrtc/c/npYIyxSBOLI. Francesco Pretto add sender_ntp_time_ms in VidoeFrame. It may help to synchronize. But as you say the latency of network affects the quality of experience a lot.

jwgsbbc · Nov 18, 2021

I didn't follow all of that thread, and it's perhaps less relevant us (currently) since we're dealing with Unity->Unity client server. If I understand correctly though it sounds like there's some work to allow for the rendered-video-source (renderer) wall-clock time to be added to the frame metadata, which could be useful, but it would still only be able to give an estimate of the round-trip-time, "motion to photons", since one could only estimate the difference between the wall clock time of the renderer and the client. For our use-case at least I think we need to be able to send the user-input-time (motion-time) from the client to the renderer and then back to the client so the client can measure the round-trip time, this presumably means we need to add arbitrary data into the frames.

cloudending · Nov 30, 2021

Modifing webrtc source code is incompatible with web. But insertable stream can be used to add custom data. I think this is a way to synchronization.

jwgsbbc · Nov 30, 2021

We got an encode-motion-time-in-pixels proof of concept working, and it works surprisingly well. In detail we took the client frameCount (actually the least significant 8 bits) sent it through to the renderer and encoded it as 8 blocks of black/white 0/1 along the bottom of the streamed frame. The corresponding camera frame is then shown when it reaches the client. We're seeing ~150ms of latency, but it's not tooooooo bad with the camera feed synced up.

kazuki_unity729 · Dec 7, 2021

@jwgsbbc, @cloudending
This project would be helpful for your issue. Could you try that?
https://github.com/keijiro/Bibcam

jwgsbbc · Dec 7, 2021

Our current solution, simpler, is working pretty well - but this looks really interesting. Thanks.

cloudending · Dec 8, 2021

kazuki_unity729 said: ↑

@jwgsbbc, @cloudending
This project would be helpful for your issue. Could you try that?
https://github.com/keijiro/Bibcam
Click to expand...

this project seems like the solution jwgsbbc proposed. It is a good way in sync render frame and camera frame

Search Unity

[Help] Render Streaming AR

jwgsbbc

kazuki_unity729

Unity Technologies

cloudending

jwgsbbc

jwgsbbc

cloudending

jwgsbbc

cloudending

jwgsbbc

cloudending

jwgsbbc

kazuki_unity729

Unity Technologies

jwgsbbc

cloudending

Search Unity

Unity ID

Useful Searches

[Help] Render Streaming AR

Unity Technologies

Unity Technologies