DLT to/from intrinsic + extrinsic
This is a quick technical note on a rather obscure topic, how to convert between two different ways of representing cameras for 3D triangulation using a pinhole camera model: DLT and intrinsic + extrinsic parameters. I’ve been asked about this a few times and have worked it out once or twice for different projects so along with Talia Weiss I decided to write out a note on the conversions. As Talia remarks, the canonical textbook for this field (Multiple View Geometry, by Hartley & Zisserman) says you can do it but doesn’t bother showing you how.
What does intrinsic + extrinsic mean?
Intrinsic + extrinsic refers to the parameters needed to describe the lens & sensor properties of a camera (intrinsic parameters) and its position and orientation in 3D space (extrinsic parameters) using a pinhole camera model. The extrinsic parameters contain 6 degrees of freedom (a 3D rotation and a 3D translation), the intrinsic parameters contain 5 degrees of freedom – two for the lens focal length (potentially different in vertical and horizontal!), two for the pixel position of the image center in the vertical and horizontal, and one to describe the skew, or deviation of the pixels from rectangularity. You really hope skew is 0, and can assume that it is for any camera you’re likely to encounter. See any computer vision textbook or website for further details.
What is DLT?
DLT stands for Direct Linear Transformation, which packs 10 degrees of freedom (no skew!) into 11 different parameters, none of which matches any of the 11 extrinsic + intrinsic parameters. Thus, DLT doesn’t reflect any of the physical parameters of the system but it is fast and easy to compute from a small set of known 3D points and the corresponding 2D pixel locations on the camera image. DLT also simplifies triangulation of a 3D point from observations in n cameras, so DLT is of some practical use. See http://kwon3d.com/theory/dlt/dlt.html for a nice overview.
Why convert between the two representations?
Usually because you got output in one form from one method or program, and need the other form for some other method or program.
DLT to intrinsic + extrinsic:
Here’s a MATLAB routine I wrote that does it [DLTcameraPosition.m]. I’ve completely forgotten the math for this and my comments in the code are not much help, but it works*.
* It actually can’t work perfectly for DLT coefficients calculated with the simplest method, an SVD solution to an overdetermined linear system. These solutions typically contain a rotation about non-orthogonal axes that can’t be nicely represented in a 3×3 rotation matrix as is used in pinhole extrinsic parameters. Oops!! The error may or may not be negligible depending on your needs. DLT coefficients calculated with the modified DLT method (http://www.kwon3d.com/theory/dlt/mdlt.html) don’t have this problem and can be converted to intrinsic + extrinsic form without error.
Intrinsic + extrinsic to DLT:
First, setup a whole system transformation – often intrinsic + extrinsic solutions to n-camera calibration problems put one camera at the origin with no rotation. This can be numerically problematic later on so put a whole-system offset in here ONLY if your intrinsic + extrinsic parameters have one of the cameras located at [0,0,0].
eR = I(3) # external rotation, a 3x3 identity matrix eT = [10 10 10] # external translation, 10 units in each direction eRT = [eR, eT' 0 0 0 1] # external translation and rotation, homogenized to a 4x4 matrix
Second, setup matrix K of the camera intrinsics:
K = [f_x 0 p_x 0 f_y p_y 0 0 1 ] # f_x,y are the horizontal and vertical focal lengths, # p_x,y are the horizontal and vertical image center in pixels
Third, setup combine the extrinsics from your calibration into a 4×4 matrix:
P = [R T' 0 0 0 1] # R is the extrinsic rotation matrix, T is the extrinsic translation vector
This looks really straightforward, but there’s some hidden complexity because different packages for computing the intrinsic and extrinsic parameters may report translation vector T differently. Some report it with the rotation already included, e.g. R*T, others do not. Some even appear to report a negative translation. In any case, what’s needed here is the translation following rotation, so your implementation might have something as complicated as -R*T’ in place of T’. If things look like they’re not working, this is probably why – test out all the possibilities until you find what works!
Next, setup a 3×3 identity matrix m with a 4th column of all zeros to grease the wheels:
m = [1 0 0 0 0 1 0 0 0 0 1 0]
Next, calculate a 12-parameter version of the DLT coefficients as follows:
DLT = K*m*P # or DLT = K*m*P*eRT if one of the cameras is located at the origin
This should give you a 3 row, 4 column array. Normalize the result by dividing by the final value
nDLT = DLT/DLT(3,4) # 3rd row, 4th column
The first 11 values of nDLT are now the 11 DLT coefficients, hooray!
nDLT = [DLT_01 DLT_02 DLT_03 DLT_04 DLT_05 DLT_06 DLT_07 DLT_08 DLT_09 DLT_10 DLT_11 - ]
If you’d like a more complete write-up explaining the underlying math please see Talia Weiss’s write-up here, with nice looking equations instead of crummy WordPress <pre> tags