Dense Stereo Matching - Surface Estimation From Multiple Images

point cloud, fx using Delaunay triangulation. This may not result in the correct connectivity, however Morris and Kanade proposed that the initial triangulation could be refined by iteratively evaluating the reprojection of the images onto the mesh, while the connectivity is changed [45]. Thus both the image data, and the point cloud would be used in the surface estimation giving a photorealistic result.

4.3 Dense Stereo Matching

As opposed to feature matching, the dense stereo matching methods do not attempt to find the best points to match. Instead every pixel is matched to all pixels along the corresponding epipolar line in the other image. Therefore dense stereo methods requires that the cameras are calibrated beforehand. To increase performance it is normally required that the images are aligned so that the epipolar lines are simply scan lines over the input images. If not, then one or both of the images can be transformed to obtain this, which is called a rectification of the image.

The result of the match is adisparity mapdescribing the disparity of each pixel and the best match found in the other image. The disparity map can either be in pixel precision or in subpixel precision, if the matching method supports it.

Using the epipolar geometry the disparity map can be transformed to a depth field, which for every pixel contains the estimated depth to the object. The depth field representation of the surface is normally only regarded as 2¹₂D, as they can not describe anything behind the surface covering the image. It can however be transformed to fx a triangular mesh, which can be post processed using techniques like in feature matching or a mesh simplification technique like [56].

There exists a wide range of methods for doing the actual matching. Some methods take the connectivity into consideration such that two neighboring pixels have disparities relatively close to each other, while other smoothes the disparity map, to remove outliers. Most methods use some means of a window that is folded over the corresponding scan lines. A good survey and comparison of the methods available can be found in [51].

20 Stereo Vision

4.3.1 Depth Resolution

When designing stereo setups for doing dense stereo matching or stereo vision in general, some consideration must be put into the desired depth resolution. It is obvious that the depth resolution increases together with the image resolutions as it allows a more precise measurements. A good rule to obtain satisfactory results is that the ratio ^baseline_depth stays within [¹₃. . .3],[19].

Figure 4.3: Illustration of the baseline and depth in a stereo setup. setup.

The baseline depth ratio is illustrated in figure 4.3, which gives an intuitive understanding of this rule. If the object is too far away then the differences in the two images are too small to estimate details in the object. On the other hand, if the object is too close, then it either falls out of the field of view of the cameras, or too much occlusion occurs. Further more materials tends to reflect the same amount of light in viewing angles closer related, thus making matching easier. It is thus a weighing between the achievable depth quality, and matching reliability.

C h a p t e r 5

Multiple View Vision

As the last chapter clarified, stereo vision can be used to estimate the depth from the two cameras to the object in the scene. From this a 3D shell of the object viewed from the cameras can be produced. It can not however in itself be used to reconstruct a whole object as viewed from every angle. Further more the depth resolution for a single stereo rig tends to be small. The direct approach to accommodate for this is of course to increase the amount and distribution of input data, which in this case means adding more input images.

Using more than two input images gives rise to a new problem that can be approached in a number of different ways. This chapter describes some of the more popular and interesting methods, that have been proposed to solve the multiple image surface estimation problem.

5.1 Dense stereo matching and stitching

While stereo vision can be used to estimate the distance from the cameras to the surface, it can not in itself reconstruct the full scene. Theshells of the surface produced by the stereo vision, can however be stitched together to recover the full surface. Thus a simple multiple view vision method is to use stereo vision

22 Multiple View Vision on pairs of the input images, and stitch the result together, see fx [44]. This method however does not use the global mutual information given in more than two images. Where the shells overlap some averaging can be done, however it is not as strong as a real multiple view surface estimation method taking all views into account at the same time.

Using dense stereo matching for multiple view vision either requires that the cameras are calibrated beforehand, or that the cameras are naturally paired with a known mutual calibration. The later can use the best fit when stitching the shells together to calculate the outer calibration of the cameras.

In document Surface Estimation From Multiple Images (Sider 37-40)