Search Unity

  1. Megacity Metro Demo now available. Download now.
    Dismiss Notice
  2. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Is it possible to do AR with motion capture instead of image processing?

Discussion in 'AR/VR (XR) Discussion' started by optijapan, Jun 21, 2018.

  1. optijapan

    optijapan

    Joined:
    Nov 22, 2017
    Posts:
    11
    Hello!

    We are using a motion capture system to track the position and rotation of a webcam and a small flat object. This data is then streamed to Unity where a Unity Camera follows the real webcam's pose and a 3D cube follows the pose of the other object. Finally, the webcam's output is rendered in the background using a webcamtexture.

    Ideally, when the camera is pointing at the other object, the virtual cube should match perfectly the position of the real object seen in the background image, and it should look 'anchored' to the real object when moving around and rotating around it. But this is not entirely the case. After correcting any misalignment and manually matching the virtual cube with the real object at an arbitrary screen location by applying a small offset to the camera until the virtual cube is where it should be, the object will be aligned only at that point, any change in perspective will make them to become misaligned again (but, they will remain matched at the perspective they were matched originally).

    When matching the objects at the center of the screen, we noticed that moving the flat object at the sides of the screen or closer to/further from the camera, will cause the most offset from the virtual and the real object, it looks like as if the distance units do not match entirely between them.

    We have tried adjusting the Unity's camera FOV to match the webcam FOV and applying a distortion effect to the webcam output to alleviate any possible lens distortion (we grabbed the values from OpenCV tools), but the issue remains. We also tried using a QR code to fix any possible offset from the tracked camera and the Unity camera, and although it worked to perfectly match the objects near the QR code, there was still a slight offset whenever the tracked object moved away from it and, when getting back to the motion capture data, the issue re-appeared.

    Edit:
    Here is a video that displays better the issue:


    Is there something we are missing? Is this idea possible at all?

    Any lead or clue would be a great help!

    Thank you so much!
     
    Last edited: Jun 21, 2018
  2. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    It may be very hard to perfectly match the camera transformation to Unity's camera by first principles alone. But you should be able to measure the difference, and so correct for it.

    So I would suggest: use a QR code, or some other object you can track very precisely in the camera view. Put this on your external motion-tracking system. Now set up an app that continuously logs both the position of the object as reported by the motion-tracking system (perhaps after doing a fairly simple perspective transformation), and the position of the object as actually measured in the camera view.

    Now move this object all around the view, while logging the data.

    Now you have a big table of XY values as calculated from the motion-tracking system, and XY values as seen by the camera. You can use curve-fitting (or more advanced machine-learning techniques, if necessary) to create a function that maps one to the other.
     
  3. optijapan

    optijapan

    Joined:
    Nov 22, 2017
    Posts:
    11
    So, do you believe is a simple mismatch between the real camera's sensor postion and Unity's camera transform (probably with a small discrepancy between both FOV values added into the mix)?

    After applying the curve-fitting function to the table of values, what would be the expected structure of the result? Will this function throw a Vector3 as the approximated offset between the two cameras or will it throw a function that will always correct this offset depending on the values it is fed?

    Thank you so much for your help!
     
    Last edited: Jun 22, 2018
  4. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    I don't know about "simple" but yes, what you have described sounds like a mismatch. There should be some relatively smooth function that can map from one coordinate system to the other.

    I would do this transformation in 2D space — i.e., take the object position as sensed with your external sensor, and put it through a projection transformation that gets it as close to 2D camera space as you can. Then compare this to the actual image position as seen by the camera. What you're trying to find is the correction function that maps one 2D position to the other.
     
  5. optijapan

    optijapan

    Joined:
    Nov 22, 2017
    Posts:
    11
    I'm sorry if this is a dumb question but how would I get a projection transformation close to the 2D camera space? Is this related to the Unity camera matrix or is about raycasting from the screen?
     
  6. JoeStrout

    JoeStrout

    Joined:
    Jan 14, 2011
    Posts:
    9,859
    I just mean you take the 3D coordinates from your external motion-capture system, and convert them into 2D space using a simple perspective projection. But yes, I guess you could do this using the Unity camera matrix.
     
  7. optijapan

    optijapan

    Joined:
    Nov 22, 2017
    Posts:
    11
    I think I get the main idea now. Thank you so much for your help! Will give it a try!
     
    JoeStrout likes this.