End Notes* - Lecture Notes on Computer Vision

As mentioned in the introduction to this chapter, it is only a small subset of multiple view geometry that is covered here. To the authors best judgment, the part chosen are the most introductory, and the ones most often use in practical 3D estimation with cameras. There are, however, two additional issues that the reader should be aware of. Firstly, the estimation algorithms mentioned here are the easies to understand and the ’standard’

ones. They however either minimize an algebraic error or in the case of the non-linear algorithms are iterative and have a risk of converging to a local minimum. Recently algorithms have been developed, that solve such optimization problems in a guarantied optimal way, c.f. e.g. [17]. The second issue not dealt much with here is the geometry of calibrated cameras, i.e. that theAare known. This case has large practical importance, but unfortunately often gives rise to high order polynomial equations in many variables. As such these algorithms are beyond the scope of this text, two few references are [12,21,26].

Part II

Image Features and Matching

Chapter 3 Extracting Features

Computer vision and image analysis is mostly concerned with doing inference from images. In doing so we often resort to extracting features from the images, such as points and lines, which we believe have some intrinsic meaning. One of the most important uses of such features is in helping to solve the correspondence problem. This is the problem of finding the correspondence between the ’same thing’ depicted in two different images, and is the subject of Chapter4. This chapter, thus, serves as a prelude to the next. There are naturally many types of features to be extracted from images, such as cars letters and so on, but here the focus is on point and line features. The motivation behind this is that these are the most commonly used in a general setting. A good reference on feature extraction for matching is [27]. As for the organization, the first two sections, namely Section3.1and Section3.2, present some general considerations in relation to feature extraction, upon which two point detection and a line detection scheme is presented in Sections3.3,3.4and3.5respectively.

Figure 3.1: Things appear at a scale, as illustrated in the above images. This scale can also vary, as e.g. the leaves of the bush above. When extracting features, we often have to choose which scale we want to work at.

This is naturally related to what we want to extract, e.g. the leaves, the branches or the trees.

3.1 Importance of Scale

Figure 3.2: The maximum of the gradient magnitude is computed for the leftmost image and depicted in the middle and right images. The difference between the middle and right image is the scale at which the gradient is computed. Notice how different structures appear at the different scales.

The things or features that we want to extract from an image appear at a scale, c.f. Figure3.1. This scale depends e.g. on the camera position relative to the object, the camera optics, and the actual size of the object

photographed. When extracting features – as well as performing many other image processing tasks – we have to chose what scale we want to work on. This is illustrated in Figure3.2, where different scales extract different structures.

To explain the choice of scale in more detail consider the one dimensional signal in Figure 3.3. When detecting edges on this signal using a filter, we find one or several edges depending on the extent of this filter.

If we use a small filter all the ridges, which might otherwise be viewed as noise, will give an edge responses. If we use a large filter only the one dominant edge will give a response. The dominant edge will, furthermore, not be detected by a small filter, because this edge only becomes dominant on a larger scale. So one view of scale relates to the size of the filters used. These filters are very often used to perform basic and preliminary image processing. Another view is that scale, and the size of the filters used, distinguish between what is noise and what is signal, in a frequency, or fourier setting.

Figure 3.3:Top:A sample one dimensional signal with one or more edges. MiddleThe sample signal filtered with asmallstep edge.BottomThe sample signal filtered with alargestep edge.

Image scale and the size of filters is naturally associated with image pyramids. An image pyramid is the successive down scaling of an image, c.f. Figure3.4. This successive down scaling of an image is referred to as a pyramid, in that placing the images on top of each other will give a pyramid, due to the decreasing size of the images. The relationship to image scale and filter size, is that instead of making the filters larger the image can be made smaller. So in effect, running the same filter on a down sampled image is equivalent to working on a coarser scale. Although this is a somewhat crude technique for working with image scale, this is often used because reducing the image size reduces the computational requirements. This is opposed to increasing the filter size, which most often increases the computational requirements.

3.1.1 Gaussian Kernels

When dealing with image scale, or scale space a working horse is the Gaussian filter, c.f. Figure3.5. A Gaussian filter is given by the following equation

gσ =g(x, σ) = 1

√

2πσ²D exp

−||x||²₂ 2σ²

, (3.1)

In document Lecture Notes on Computer Vision (Sider 56-61)