Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

Camera intrinsic matrix (3 by 3 matrix) has a negative value and only values on main diagonal?

Discussion in 'Computer Vision' started by zimmer550king101, Oct 6, 2022.

  1. zimmer550king101

    zimmer550king101

    Joined:
    May 26, 2021
    Posts:
    9
    I recorded a simulation by moving around with my camera and viewing a bunch of static objects. The camera intrinsic parameters were recorded as camera_intrinsic and extrinsic parameters were recorded as translation and rotation inside ego of the capture JSON (please correct me if this is wrong).

    Now, I am experimenting with some triangulation methods (localizing objects using their 2D bounding box and the camera's projection matrix). Below is how I set up everything to get the projection matrix:

    Code (CSharp):
    1.  
    2. import numpy as np
    3. from scipy.spatial.transform import Rotation as Rot
    4. int_mat = np.array([
    5. capture.sensor.camera_intrinsic[0],
    6. capture.sensor.camera_intrinsic[1],
    7. capture.sensor.camera_intrinsic[2]
    8. ])
    9. r = Rot.from_quat(capture.ego.rotation)
    10. rot_mat = r.as_matrix()
    11. t = np.array([capture.ego.translation]).transpose()
    12. ext_mat = np.hstack((rot_mat, t))
    13. proj_mat = int_mat @ ext_mat
    When I input the projection matrix and 2d bounding box into my custom algorithm, I noticed the triangulation was well-off. It was then that I noticed the matrix of intrinsic parameters only had values on the main diagonal and one of them is actually negative. I come from a computer vision background but not a computer graphics one. Can anyone here guide me on how to convert the intrinsic matrix provided by Unity Perception into a normal one with fx, fy, cx, cy, and skew?

    I looked at the 3D Ground Truth Bounding Boxes inside the Perception_Statistics notebook but the authors seem to be considering the intrinsic parameter matrix as the projection matrix (Isn't the projection matrix supposed to be 3 by 4 due to the matrix multiplication of the 3 by 3 intrinsic parameter matrix with the 3 by 4 extrinsic parameter matrix?).

    Furthermore, I also had a look at a similar question but the accepted answer there seems to assume the projection matrix is already provided while one of the answers is trying to use the FOV even though I don't see it in the data I have collected through Unity Perception.

    For reference, below are the two relevant parts of the capture JSON file I am using:

    Code (CSharp):
    1. "sensor": {
    2.         "sensor_id": "7fcdda27-3029-4bb9-83f3-9d3eac23a1a1",
    3.         "ego_id": "6a1ebd8b-1417-49a0-befc-8892801a9aa3",
    4.         "modality": "camera",
    5.         "translation": [
    6.           0.0,
    7.           0.0,
    8.           0.0
    9.         ],
    10.         "rotation": [
    11.           0.0,
    12.           0.0,
    13.           0.0,
    14.           1.00000012
    15.         ],
    16.         "camera_intrinsic": [
    17.           [
    18.             0.705989,
    19.             0.0,
    20.             0.0
    21.           ],
    22.           [
    23.             0.0,
    24.             1.73205078,
    25.             0.0
    26.           ],
    27.           [
    28.             0.0,
    29.             0.0,
    30.             -1.0006001
    31.           ]
    32.         ]
    33.       },
    34.       "ego": {
    35.         "ego_id": "6a1ebd8b-1417-49a0-befc-8892801a9aa3",
    36.         "translation": [
    37.           2.32,
    38.           1.203,
    39.           2.378
    40.         ],
    41.         "rotation": [
    42.           -0.0229570474,
    43.           0.976061463,
    44.           -0.173389584,
    45.           -0.129279226
    46.         ],
    47.         "velocity": null,
    48.         "acceleration": null
    49.       }