Search Unity

  1. Unity support for visionOS is now available. Learn more in our blog post.
    Dismiss Notice

Camera intrinsic matrix (3 by 3 matrix) has a negative value and only values on main diagonal?

Discussion in 'Computer Vision' started by zimmer550king101, Oct 6, 2022.

  1. zimmer550king101

    zimmer550king101

    Joined:
    May 26, 2021
    Posts:
    9
    I recorded a simulation by moving around with my camera and viewing a bunch of static objects. The camera intrinsic parameters were recorded as camera_intrinsic and extrinsic parameters were recorded as translation and rotation inside ego of the capture JSON (please correct me if this is wrong).

    Now, I am experimenting with some triangulation methods (localizing objects using their 2D bounding box and the camera's projection matrix). Below is how I set up everything to get the projection matrix:

    Code (CSharp):
    1.  
    2. import numpy as np
    3. from scipy.spatial.transform import Rotation as Rot
    4. int_mat = np.array([
    5. capture.sensor.camera_intrinsic[0],
    6. capture.sensor.camera_intrinsic[1],
    7. capture.sensor.camera_intrinsic[2]
    8. ])
    9. r = Rot.from_quat(capture.ego.rotation)
    10. rot_mat = r.as_matrix()
    11. t = np.array([capture.ego.translation]).transpose()
    12. ext_mat = np.hstack((rot_mat, t))
    13. proj_mat = int_mat @ ext_mat
    When I input the projection matrix and 2d bounding box into my custom algorithm, I noticed the triangulation was well-off. It was then that I noticed the matrix of intrinsic parameters only had values on the main diagonal and one of them is actually negative. I come from a computer vision background but not a computer graphics one. Can anyone here guide me on how to convert the intrinsic matrix provided by Unity Perception into a normal one with fx, fy, cx, cy, and skew?

    I looked at the 3D Ground Truth Bounding Boxes inside the Perception_Statistics notebook but the authors seem to be considering the intrinsic parameter matrix as the projection matrix (Isn't the projection matrix supposed to be 3 by 4 due to the matrix multiplication of the 3 by 3 intrinsic parameter matrix with the 3 by 4 extrinsic parameter matrix?).

    Furthermore, I also had a look at a similar question but the accepted answer there seems to assume the projection matrix is already provided while one of the answers is trying to use the FOV even though I don't see it in the data I have collected through Unity Perception.

    For reference, below are the two relevant parts of the capture JSON file I am using:

    Code (CSharp):
    1. "sensor": {
    2.         "sensor_id": "7fcdda27-3029-4bb9-83f3-9d3eac23a1a1",
    3.         "ego_id": "6a1ebd8b-1417-49a0-befc-8892801a9aa3",
    4.         "modality": "camera",
    5.         "translation": [
    6.           0.0,
    7.           0.0,
    8.           0.0
    9.         ],
    10.         "rotation": [
    11.           0.0,
    12.           0.0,
    13.           0.0,
    14.           1.00000012
    15.         ],
    16.         "camera_intrinsic": [
    17.           [
    18.             0.705989,
    19.             0.0,
    20.             0.0
    21.           ],
    22.           [
    23.             0.0,
    24.             1.73205078,
    25.             0.0
    26.           ],
    27.           [
    28.             0.0,
    29.             0.0,
    30.             -1.0006001
    31.           ]
    32.         ]
    33.       },
    34.       "ego": {
    35.         "ego_id": "6a1ebd8b-1417-49a0-befc-8892801a9aa3",
    36.         "translation": [
    37.           2.32,
    38.           1.203,
    39.           2.378
    40.         ],
    41.         "rotation": [
    42.           -0.0229570474,
    43.           0.976061463,
    44.           -0.173389584,
    45.           -0.129279226
    46.         ],
    47.         "velocity": null,
    48.         "acceleration": null
    49.       }