Search Unity

  1. Welcome to the Unity Forums! Please take the time to read our Code of Conduct to familiarize yourself with the forum rules and how to post constructively.

Capturing screenshots with aligned content

Discussion in 'VR' started by EdgarSantos, Jun 17, 2019.

  1. EdgarSantos

    EdgarSantos

    Joined:
    Nov 11, 2013
    Posts:
    27
    Hello,

    I'm trying to make the following to work:
    1) Capture screenshots (+holograms) with Hololens and save necessary data
    2) Run a program elsewhere (e.g. PC) that replicates the Hololens conditions (e.g. hologram position, camera position) and generates content (e.g. outline around an object) that aligns perfectly on top of the screenshots captured by Hololens.

    Approach 1:
    1) In Hololens use the PhotoCapture API and save the screenshot together with the projection and cameraToWorld matrices provided by the API
    2) When post-processing on PC, I use these matrices to correctly set-up the camera, position the objects in the same way as Hololens and do the necessary effects.
    Result: This works perfectly, as the post-generated visual content perfectly aligns (most of the time anyway) with the screenshots taken by Hololens.

    The problem I have with Approach 1 is that the PhotoCapture API only provides frames at around 2/3 FPS (this is if I don't encode to png, just save the raw data to disk and encode later "offline") and I would like to have a more regular stream of data (e.g. like capturing a video and then generating the post-processed content for each frame of the video). Which led me to Approach 2.

    Approach 2:
    1) In Hololens use the VideoCapture API to capture and save a video while capturing the virtual camera positions and rotations every frame.
    2) At the start of the video capture, do a PhotoCapture screenshot, use the matrices to calculate the relative position and rotation between the virtual camera and the locatable camera. This is assuming this transformation is constant (which I don't think it is, more on that below).
    3) On post-processing, synchronize the video with the camera alignment (this is done empirically based on specific video actions). This only exists to make sure the captured camera data aligns with the frames in the video (we found out that the video was delayed around 0.5s).
    4) After this, transform each camera data for each frame using the locatable camera offsets calculated in (2).
    Result: This kind of works for a couple of video frames (while the camera pose is stable and looking the initial direction). When the camera changes perspective, perfect alignment starts to get lost (even though it is kind-of close).

    My question is: Is approach 2 possible? What is the best way to calculate the virtual-camera TO locatable-camera transformation? I tried the following with no success:

    A) Calculate locatable camera position/rotation from the cameraToWorld matrix and then keep the position and rotation offsets to apply to virtual camera on post-process later.

    B) Extract the position and rotation from the cameraToWorld matrix itself, do the same for the virtual camera's cameraToWorld matrix and keep that offset instead.

    Final notes: Since neither A or B were successful, I'm kind of suspecting the transformation between the virtual camera and locatable camera is not constant. Can someone give me some impressions on all this?
    Is there a better way to get a steady stream of camera data that includes the projection and cameraToWorld matrices?

    Thank you