Classification - The Proposed Method - Aalborg Universitet A Computer Vision Story on Video Seq

2.4 The Proposed Method

2.4.2 Classification

The segmented image from the previous step has separated skin or skin-like regions from the non-skin ones. Now, the system goes through these skin or skin-like regions and classifies them as face or non-face. The main classifier is a neural network. The advantage of neural networks as classifier in the face detection problem is their auto ability in extraction the characteristics of the complicated face templates [9]. However, this auto ability costs high computations between the layers of the network. Furthermore, the topology of the network influences both the computation and achieving acceptable results.

In order to reduce the computation of the system we avoid the neural network from blindly scanning all skin or skin-like regions in the segmented image. Instead we use the neural network in a cascaded classifier. The cascaded classifier consists of two different classifiers. At the first level a fuzzy inference engine, which we call it a pre-classifier goes through the skin or

Figure 2-4: (Left) input color image, (middle) probability image and (right) segmented image

2.4.2.1 Pre-classifier: Fuzzy Inference Engine

By scanning the binary image of the previous step from the top left to the bottom right each group of connected pixels is considered a region. Based on the extracted features for each segmented region the initial decision will be made by the pre-classifier, i.e. faces or not.

The features involved in the decision making are: number of holes inside a region, center and orientation of the region, length and width of the region, ratio of the length to the width, the ratio of the holes to their surrounding region, and correlation between the region and a face template (Figure 2-5). Note that some of these features are used to calculate some of the others.

Figure 2-5: Face Template used in calculating the correlation [16]

The reasons for choosing these features are their ease of extraction in terms of computation and their reliability when used together. Regarding the details of feature extraction, the reader is referred to [16].

Figure 2-6: Designed member functions for fuzzy inference engine input variables

As the first step of the cascaded classifier a fuzzy inference engine accepts the extracted features for each region and using a set of fuzzy rules tries to make an initial decision about them being faces or not. We have used a Mamdani [18] fuzzy model for implementing this fuzzy inference engine. This engine has four inputs which correspond to the used features and they are Number of holes, correlation with a face template, the ratio of the area of the holes to their surrounding and the length of the region to its width. These inputs are fuzzified using the membership functions shown in Figure 2-6. Hereafter the rules shown in Figure 2-7 are applied. The weights of all rules are equal and are considered as one. The used aggregation method for the rules is Maximum value and the defuzzification method is the Mean value of maximum.

Figure 2-7: The used rules in Fuzzy Inference Engine

Using this fuzzy inference engine has established an acceptable tradeoff between the computation and the missed faces while the rate of correct detection is acceptably high. Figure 2-7 shows how the output of the fuzzy inference engine changes with respect to the changes of its input parameters.

Figure 2-8: Change of the Fuzzy Inference Engine output with respect to change of its input

If the output of the above fuzzy inference engine indicates the presence of a face in a region, this region will be fed into the main classifier which is described in the following subsection.

2.4.2.2 Main classifier: Optimized Neural Network by a Genetic Algorithm

The accepted regions from the previous step are resized and then fed to our main classifier which is a neural network. Following the work done in [19] all the parameters involved in the topology of the network (e.g. number of neurons in hidden layer, activation function for each neuron, learning rates, connections weights, etc.) are optimized by a genetic algorithm.

According to Figure 2-9 each genotype codes a phenotype or candidate solution. The phenotypes (the resulting neural networks) are trained with the back-propagation algorithm.

The evaluation of a phenotype determines the fitness of its corresponding genotype. The evolutionary procedure deals with genotypes, preferably selecting genotypes that code phenotypes with a high fitness, and reproducing them. Genetic operators are used to introduce variety into the population and to test variants of candidate solutions represented in the current population. In this way, over several generations, the population will gradually evolve towards genotypes that correspond to phenotypes with high fitness [19]. In our work the selection method, cross over method, possibility of cross over are roulette, one point, 0.9, respectively.

This algorithm converges after 30 generation.

Figure 2-9: Design process of the evolutionary network topology [19]

The fitness of each phenotype is evaluated by calculating the sum of the squares error for its associated network. Figure 2-10 shows the classification error for the best network on the training and validation sets. 400 epochs is sufficient for training each network. In our work the selection method is roulette and the cross over method and its possibility are one point and 0.9, respectively. This algorithm converges after 30 generation.

Figure 2-10: Networks classification error on the training and validation sets vs. epochs.

In document Aalborg Universitet A Computer Vision Story on Video Sequences: From Face Detection to Face Super- Resolution using Face Quality Assessment Nasrollahi, Kamal (Sider 42-47)