Search Unity

  1. Unity Asset Manager is now available in public beta. Try it out now and join the conversation here in the forums.
    Dismiss Notice

Question Camera Intrinsic Matrix

Discussion in 'Computer Vision' started by dsriaditya999, Jul 18, 2021.

  1. dsriaditya999

    dsriaditya999

    Joined:
    Jul 5, 2021
    Posts:
    3
    Backstory:
    Hello, I have been trying to implement a Pose Estimation pipeline using the UR3 robot (similar to the Pose Estimation Tutorial). I have been using Perception Package for Data collection and 3D bounding box labeller for storing ground truth. I am trying to use a Pose Estimation Model similar to DOPE (https://arxiv.org/abs/1809.10790). In this method, we require the 2D ground-truth locations of the 3D Bounding boxes for training the model. Then, while inference, we can use a PnP algorithm (cv2.solvPnP()) to get the pose of the object. This can be later used for pick and place as shown in tutorial.
    (I am a little new to the field to the field of Comp. Vision so please bear with me :))

    Issue:

    I have read the documentation for reading the json dataset (https://docs.unity3d.com/Packages/c...6/manual/Schema/Synthetic_Dataset_Schema.html) where it is mentioned we can find the intrinsic matrix of the camera sensor under captures.sensor--camera_intrinsic. I also checked MathWorks reference mentioned in the same webpage. But there seems to be some discrepancy (I have a doubt) in the JSON captures.

    1. I found some negative value in the matrix (See attached example capture file). Can someone explain this? Is the intrinsic matrix of the camera ?

    2. Also, are the parameters of this matrix not expressed in pixels?

    3. I am planning to use cv2.projectPoints(object_vertex_coordinates, rvec, tvec, camera_intrinsic_matrix,..). So, I can get the rvec, tvec, vertex co-ordinates of the bounding box (from the size) in annotations. So, for camera_intrinsic_matrix do I use the captures.sensor.camera_intrinsic (as mentioned) earlier?

    Along with the answers, any resource for understanding more about these topics is also very helpful. Thankyou for your time!

    example_cap_json.PNG MATLAB_ref_intrinsic.png
     
  2. d_pepley

    d_pepley

    Joined:
    Oct 2, 2020
    Posts:
    18
    I also had similar issues with the output intrinsic matrix, so I ended up building it on my own during post processing in Python. Not ideal, but it has worked for me.

    cameraFy = camResY*0.5/m.tan((camFovV/2)*m.pi/180)
    cameraFx = cameraFy
    cameraIntMat = np.array([[cameraFx, 0, camResX/2], [0, cameraFy, camResY/2],[0,0,1]])
     
  3. StevenBorkman

    StevenBorkman

    Joined:
    Jun 17, 2020
    Posts:
    16
    Our dataset insights project (here) utilizes a jupyter notebook to visualize the results from a perception run. In there we have example code on how to convert the results of 3D bounding box labeler with the intrinsic projection matrix into 2D camera relative pixel coordinates. We have two different conversion processes depending on your camera being in projection or orthographic projection systems. The python code that we use is here:

    3D bounding box plot file: here


    def _project_pt_to_pixel_location(pt, projection, img_height, img_width):
    """ Projects a 3D coordinate into a pixel location from a perspective camera.
    Applies the passed in projection matrix to project a point from the camera's
    coordinate space into pixel space.
    For a description of the math used in this method, see:
    https://www.scratchapixel.com/lessons/3d-basic-rendering/computing-pixel-coordinates-of-3d-point/
    Args:
    pt (numpy array): The 3D point to project.
    projection (numpy 2D array): The camera's 3x3 projection matrix.
    img_height (int): The height of the image in pixels.
    img_width (int): The width of the image in pixels.
    Returns:
    numpy array: a one-dimensional array with two values (x and y)
    representing a point's pixel coordinate in an image.
    """
    _pt = projection.dot(pt)
    # compute the perspective divide. Near clipping plane should take care of
    # divide by zero cases, but we will check to be sure
    if _pt[2] != 0:
    _pt /= _pt[2]
    return numpy.array(
    [
    int(-(_pt[0] * img_width) / 2.0 + (img_width * 0.5)),
    int((_pt[1] * img_height) / 2.0 + (img_height * 0.5)),
    ]
    )
    def _project_pt_to_pixel_location_orthographic(
    pt, projection, img_height, img_width
    ):
    """ Projects a 3D coordinate into a pixel location from an orthographic camera.
    Applies the passed in projection matrix to project a point from the
    camera's coordinate space into pixel space.
    For a description of the math used in this method, see:
    https://www.scratchapixel.com/lessons/3d-basic-rendering/perspective-and-
    orthographic-projection-matrix/projection-matrix-introduction
    Args:
    pt (numpy array): The 3D point to project.
    projection (numpy 2D array): The camera's 3x3 projection matrix.
    img_height (int): The height of the image in pixels.
    img_width (int): The width of the image in pixels.
    Returns:
    numpy array: a one-dimensional array with two values (x and y)
    representing a point's pixel coordinate in an image.
    """
    # The 'y' component needs to be flipped because of how Unity works
    projection = numpy.array(
    [
    [projection[0][0], 0, 0],
    [0, -projection[1][1], 0],
    [0, 0, projection[2][2]],
    ]
    )
    temp = projection.dot(pt)
    pixel = [
    int((temp[0] + 1) * 0.5 * img_width),
    int((temp[1] + 1) * 0.5 * img_height),
    ]
    return pixel
     
    BBraitling likes this.
  4. dsriaditya999

    dsriaditya999

    Joined:
    Jul 5, 2021
    Posts:
    3
    Thanks a lot! This solved my problem
     
    StevenBorkman likes this.
  5. zimmer550king101

    zimmer550king101

    Joined:
    May 26, 2021
    Posts:
    9
    Can you explain what is camFovV here? Also, why are you not calculating the x and y coordinates of the principal point (cx and cy) and the skew here?
     
  6. zimmer550king101

    zimmer550king101

    Joined:
    May 26, 2021
    Posts:
    9
    Isn't a projection matrix a 3 by 4 matrix? It is obtained by multiplying a 3 by 3 intrinsic parameter matrix with a 3 by 4 extrinsic parameter one. Or is it done differently in computer graphics?
     
  7. joyera

    joyera

    Joined:
    Jan 2, 2022
    Posts:
    3
    I am now able to calculate 2D coordinates using the IntriMatrix, but when I calculate the [R,t] matrix using the cv.slove algorithm in python, I find that the resulting values do not match the [R,t] matrix in the jason file