Point Triangulation - Lecture Notes on Computer Vision

Figure 2.9: Two landscape images taken from the same spot, i.e. zero baseline. The black lines and dots illustrate the 20 points correspondences manually annotated.

Figure 2.10: Based on the 20 annotated point correspondences in Figure2.9, a homography is estimated and the left image is warped to fit the right, resulting in this figure.

An example: Image Panoramas

A place where a homography is often used to relate two images taken with zero base line is in generating image panoramas. This is done from a series of images taken from the same spot, c.f. Figure 2.10. This result is generated from the two images in Figure2.9, where 20 point correspondences have been annotaed manually, and the method of Section 2.8 is used to estimate the needed homography. This homography is then used to warp the left image of Figure 2.9, c.f. Section 2.3.2. The two images are averaged together resulting in Figure2.10. It is noted that much better and automated methods exist for both finding the correspondences between the images, and for blending them together. The aim here is, however, illustrating the theory and not making nice results.

2.4 Point Triangulation

One of the classical task of 3D inference from cameras, is making 3D measurements from two or more known cameras. This is known as triangulation, and in the case of points; point triangulation. Specifically we have a set of cameras, where we know the internal and external calibration of the all the cameras, and we want to find the position of an object that is identified in all cameras. This boils down to finding the coordinates of a 3D pointQfrom it’s known projections,qii= [1, . . . , n], innknown cameras,Pi. This will be covered here,

3The homography has been scaled by a factor of 40 (diag(40,40,1)·H), such that each chess board square would be40×40 pixels, instead of1×1.

firstly through a linear algorithm, upon which it is discussed how the estimate can become statistically more meaningful.

p1

Image plane 1 Image plane 2

Q

p2

Figure 2.11: The result of point triangulation is the 3D point,Q, closest to the back-projections of the observed 2D points.

The basis of point triangulation, as seen in Figure2.11, is that the back-projected line of each observed 2D points,qi, forms a constraint on the position of the 3D point,Q. So in the absence of noise, one would only have to find the intersection of these 3D lines. Two, or more, lines do not in general intersect in 3D, so with noise, we need to find the 3D point that is closest to thee lines. What is meant by closest will be discussed in Section2.4.3.

2.4.1 A Linear Algorithm

Here a linear algorithm for 3D point triangulation is presented. To ease notation, the rows of thePiwillhere be denoted by a superscript, i.e.

Pi =



 P_i¹ P_i² P_i³



 , and thus the pinhole camera model, (1.21), can be expanded as follows

q_i =



 sixi

siyi

s_i



=



 P_i¹ P_i² P_i³



Q ⇒

sixi = P_i¹Q , siyi =P_i²Q , si=P_i³Q ⇒ xi = s_ix_i

= P_i¹Q

P_i³Q , yi= siyi

= P_i²Q

P_i³Q . (2.21)

Doing a bit of arithmetic on (2.21) results in x_i = P_i¹Q

P_i³Q , y_i = P_i²Q P_i³Q ⇒ P_i³Qx_i =P_i¹Q , P_i³Qy_i=P_i²Q ⇒ P_i³Qx_i−P_i¹Q= 0 , P_i³Qy_i−P_i²Q= 0 ⇒

P_i³xi−P_i¹

Q= 0 , P_i³yi−P_i²

Q= 0 . (2.22)

Here (2.22) is seen to be linear constraints inQ. SinceQhas three degrees of freedom we need at least three such constraint to determineQ. This corresponds to projections in at least two known cameras, since each camera poses two linear constraints in general. Comparing with Section1.1it is seen that thexandyparts of (2.22) correspond to planes. E.g. knowing thex-coordinate ofQ’s image in a given camera, defines a plane

2.4. POINT TRIANGULATION 47 with coefficientsP_i³xi−P_i¹, whichQlies on. The intersection of the planes thexandy coordinates pose is the 3D line corresponding to the back-projection of the 2D pointq_i.

The wayQis calculated from all these linear constraints, is to stack them all in matrix⁴

Then (2.22) is equivalent to

BQ=0 .

With noisy measurements, however, this will not hold perfectly, and we instead solve

minQ ||BQ||²₂ . (2.23)

This is seen to be a least squares problem, which is straight forward to solve c.f. AppendixA, as illustrate in the following MatLab code

[u,s,v]=svd(B);

Q=v(:,end);

2.4.2 An Example

Figure 2.12: Two images with known camera models. Two points corresponding to the same 3D point have been annotated, such that this 3D point can be estimated.

As an example of point triangulation consider the case in Figure2.12. Here two points corresponding to the same 3D point have been annotated in the two images with coordinates⁵

q₁= and the corresponding cameras are given by

P1=





3274 −447 −1027 47431 1120 2952 848 6798

1 0 1 4



 , P2 =





3315 314 941 11949 398 3024 1177 −2417

0 0 1 −2



 .

4Ifn= 2there is naturally only four rows ofBcorresponding toP1andP2.

5This Example is made from real data, as such rounding errors occur.

The linear equations in the form ofBare then given by

−2068 212 2342 −39501

−631 −3047 −314 −3582

−3160 240 −27 −13571

−298 −3071 −587 1371





 ,

The solution to (2.23) is then given by

This is the linear estimate of the 3D point.

2.4.3 A Statistical Issue*

As mentioned, there is an issue with the linear algorithm presented in Section2.4.1. This issue is that the pinhole camera mode, as expanded in (2.21), and the localization of the 2D pointsq_iare assumed to be perfect and without noise. This is not realistic. Thus a more accurate (2.21) should look as follows

x_i= P_i¹Q

P_i³Q +ε^x_i , y_i= P_i²Q

P_i³Q +ε^y_i . (2.24)

Where(ε^x_i, ε^y_i)is the noise of the point 2D location,(xi, yi). Redoing the calculations of (2.22) based on (2.24) instead of (2.21) gives

xi = P_i¹Q WhereP_i³Qis equal to the distance from the camera to the 3D point, as can be seen from the derivation in Section1.4. So with the error model as in (2.24), the minimization problem in (2.23) is equivalent to

minQ ||BQ||²₂ =

That is observations made by cameras further from the 3D point are weighted more. Given that the error model in (2.24) is the correct one, this is not a meaningful quantity to minimize. It is however the result a a linear algorithm which is computationally feasible and in general gives decent results. Algorithms with this property are said to minimize analgebraic error measure.

On The Error Model*

The error model given in (2.24) basically states that the meaningful error is on the image point location, c.f.

Figure2.13-Left. The motivation for this is that the two main sources of error is identifying, where in an image entity is actually seen, c.f. Figure2.13-Right, and unmodelled optical phenomena, such as radial distortion.

Both these entities are captured well by a deviation in the point location, as in (2.24).

In document Lecture Notes on Computer Vision (Sider 45-49)