Multiple View Geometry - Surface Estimation From Multiple Images

As with the two camera case some geometrical considerations can be used to calculate the 3D structure from known matches in the images. The fundamental matrix used to describe the relation between two views is sometimes known as thebifocal tensor. The idea can be extended to three and four views to obtain thetrifocal tensor and thequadrifocal tensor. Havingmviews can be described using a m-focal tensor, however as we shall see it makes less sense describing the camera geometry this way, when the number of views goes up.

As the two view geometry showed, the image coordinates of the projections of a point in space M, is all that is needed to calculate it. In the case of three cameras and three projections ofM, more information is added, and thus one does not need all information from all cameras to calculateM. It can be shown ([27]) that only 4 parameters are needed, for the two view case it is the 4 coordinates of the projections, for the three view case it is two coordinates of one of the views and one coordinate in each of the other views. Thus using more than 4 cameras makes the problem over constrained, which would result in not all cameras being used directly. It is of course waste of information only using 4 parameters when calculating M. The next sections describe different approaches to make use of the extra information.

5.2.1 The Factorization Method

If the image coordinates are subject to noise, the extra information given using multiple views can be used to estimate M with more accuracy. This elegant method calledfactorization, is somehow an extension to the idea of the 8 point algorithm, that estimates the multiple view equivalent to fundamental matrices.

5.2 Multiple View Geometry 23 It was first described by Tomasi and Kanade in [54], for orthogonal views of the scene, but later rewritten by P. Sturm and B. Triggs in [53], to the projective case as described here.

The idea is that having mcameras defined by their unknown camera matrices P1. . . Pmand the image coordinates ofn3D pointsM1. . . Mnone can describe the projection of each point into each camera by

ζ_i^jm^j_i =PiM^j , i= 1. . . m, j= 1. . . n (5.1) , whereζ_i^j is the projective depth of pointjin camerai. Thesen×mequations can be stacked into matrix form giving



In this form only the measured image points in W are known. If however we assume that the ζ’s are known, then W is known and can be decomposed into U DV^T using singular value decomposition. Using the arguments of [53], these can be used to calculate P and S. If on the other hand P and S are known, then the ζ’s can be calculated. The factorization method is thus to initialize theζ’s to one, and iteratively calculateP,S and theζ’s until convergence.

The result using the factorization method is good, however there is some draw-backs. As with the two view case, the result is a point cloud of non related points and must be triangulated in some way before a real 3D model is ob-tained. Further more, the feature matching is error prone, and occlusion in the images makes some of the information unreliable. The problem can be addressed to make a more robust factorization, an example of such research can be found in [8].

5.2.2 Using Bundle Adjustment

A general approach to use all information is the so-called bundle adjustment first discussed by D. Brown in 1976, [14]. The name of this method comes from

24 Multiple View Vision the underlying idea which involves adjusting the bundle of rays between each camera center and the set of 3D points. It is thus a method for calculating both the 3D points and the geometry between the cameras at the same time. This problem is approached using a minimization scheme to minimize the squared geometric error between all image points and the projected world points. Thus if reusing the notation in the last section, the goal is to minimize

Pmin_i,M^j

Minimizing such a problem can be a huge task, even using a relatively few points and cameras. Fx if having 10 cameras and 100 world points corresponds to a minimization problem solving for above 300 variables concurrently. The nature of the problem, however, allows for a much faster minimization. Each residual in the problem, ie. ||m^j_i−PiM^j||², only depends on a single point and a single view. This can be exploited, to reduce the computational efforts used to minimize the problem considerably. The actual minimization can be done using any minimization technique, however to avoid getting stuck in a minima the Levenberg-Marquardt minimization can be used. Even though the problem has been reduced, it still requires an initial guess, which makes it a suitable choice for polishing the solution of other faster methods. The image points used in this problem should be of high quality, or at least be uniformly distributed around the correct image points.

One of the advantages of using this method, is its generality. It can thus handle features not being visible in all views by leaving it out of the minimization problem. It can not, however, in its basic form discard outliers, to avoid them.

A new and thorough description of bundle adjustment can be found in [55].

5.2.3 Using Lines and Curves

Points are not the only useful feature detectable in an image. Some research has been done in extending the theory to using lines, patches and curves of contours in the images. These primitives are larger, which makes them easier to match correctly. Further more they do not only represent a single point in space, and thus contains information on the connectivity of the surface as well.

It is however rather difficult identifying such primitives in a deterministic way.

In document Surface Estimation From Multiple Images (Sider 40-43)