Test of the Scaling of the Sound Signals at the Eardrum . 66

7.2 Single dataset

7.2.1 Test of the Scaling of the Sound Signals at the Eardrum . 66

When creating the sound signals, it is possible to specify both the overall input level at the eardrums along with a signal to noise ratio. This is described in detail in Chapter 4. In the following, it will be tested what this scaling does for the classication of the signals. First of all, a scenario with xed target and noise levels for each source is presented, here there is neither specied an overall input level nor a SNR. In this case the overall level depends on the number and placements of the target and noise sources. Two other cases is studied as well, both where an overall input level is set to 65 dB, one with an SNR of 0 dB and one with an SNR of 10 dB.

7.2.1.1 Fixed Target and Noise Levels for Each Source

Making sure that no overall input level at the eardrum is dened and that no SNR is specied, makes the signals a combination of the sources, and any level of noise depends only on how many noise sources is included, their specied

level and their placement. Creating signals in this way and testing them with the framework result in the following confusion matrix.

Table 7.4: Testing the situation with xed target and noise levels for each source

car misc

car 0.9160±0.0469 0.0840 misc 0.0356 0.9644±0.0313

7.2.1.2 Final SNR Set to 0 dB

Setting the overall input level at the eardrum to 65 dB and the SNR to 0 dB makes the signals a great confusion between the speech target and the noise signals. Focus in this case will be unclear since the SNR forces the levels of both target and noise to be equal. This results in the following confusion matrix.

Table 7.5: Testing the situation where the overall input level is set to 65 dB with a SNR of 0 dB

car misc

car 0.8027±0.0673 0.1973 misc 0.0629 0.9371±0.0410

7.2.1.3 Final SNR Set to 10 dB

Setting the overall input level at the eardrum to 65 dB and the SNR to 10 dB makes the signals focus much more on the speech target than on any of the noise signals. This results in the following confusion matrix.

Table 7.6: Testing the situation where the overall input level is set to 65 dB with a SNR of 10 dB

car misc

car 0.8720±0.0565 0.1280 misc 0.0516 0.9484±0.0374

A scaling of the overall input level at the eardrum is not the most realistic way to represent how a hearing aid processes the sounds, but in some cases it can be useful when trying to simulate certain situations. From the results above it can be seen that the situation with xed target and noise levels for

each source results in the highest TP value for the car of 91.6% compared to only80.27% and87.2% in the other tests. The combination of all the possible noise signals stands more out when there is no limit set for the overall level of these. This gives a more realistic representation of the environments, since all the environments are created to make it possible for the classier to distinguish between realistic environments. Testing the possibility of specifying the overall input level and a SNR results in the conclusion that this is not benecial for the correct classication rate. Creating signals with xed target and noise levels for each source gives a much higher correct classication and is therefore considered as the best possible way to represent the dierent environments. This is also the most realistic way to represent these. From this test it is decided that only situations with xed target and noise levels for each source are considered in the nal framework and in any of the following tests.

7.2.2 Further Analysis of the Situation with Fixed Target and Noise Levels for Each Source

From the previous test, it is obvious that the situation with xed target and noise levels for each source results in the highest correct classication rate. This situation will therefore be used for the further investigations of the possibilities for the classication. First of all the training set will be analysed with a cross-validation to nd the best possible feature set for a pruning of the classication tree and to reduce the number of features considered.

For the single dataset with xed target and noise levels for each source, a test is conducted to nd if it is possible to prune the feature set and by this reduce the number of terminal nodes in the classication tree. For this, a cross-validation is used, see Section 5.2.2. Calculating a cross-validation for the tree can be used to decide if the tree should be pruned. When a cross-validation is used, three error rates can be plotted, for train data, validation data and test data, these error rates can be seen in Figure7.2. In the gure, the solid line shows the estimated cost for each tree size, the dashed line marks one standard error above the minimum, and the red square marks the smallest tree under the dashed line.

In the plot of the test set, the red square marks the smallest tree from the cross validation of the training set. Both of these squares are set to mark where a possible pruning of the tree could result in a tree with less calculations (minimum cost) and therefore also fewer terminal nodes.

The error rate for the train data decreases with a higher number of terminal nodes. The error rate is a measure of how well the classier performs with the specied number of terminal nodes. A low error rate is preferable since this indicates an improvement in the classier, but the more terminal nodes that are

Figure 7.2: Left:Plot of the error rate for the training set, Middle: the val-idation set, Right: for the test set for the situation with xed target and noise levels for each source. The solid line shows the estimated cost for each tree size, the dashed line marks one stan-dard error above the minimum, and the red square marks the smallest tree under the dashed line for the validation set.

included in the tree, the higher is the computational cost. The cross-validation is therefore used to see if a pruning of the tree can reduce the number of terminal nodes while still getting an acceptable low error rate. It can be seen in this case presented here, that it is not possible to prune the tree to get a lower acceptable error rate, the largest tree considered is the only tree where the error rate falls beneath the one standard error above the minimum for the validation set.

The curve for the training set behaves as expected, with a level in the beginning corresponding to chance, since the car environment forms21.43%of the training data. Increasing the tree size then decreases the error rate for each step as expected. The validation set behaves almost in the same way, but an increasing error rate can occur with an increase in tree size, this all depends on which part of data is used for the subsamples that are validated. Looking at the test set

rate could give rise to a suspicion that a pruning of this tree could be benecial.

This is dicult to be sure of, and the number of terminal nodes is in this case already fairly small, so a pruning is not conducted in this case.

The test of possible pruning was conducted for two other situations as well, the situations also considered in the test of scaling. The error rate plots for these tests can be seen in AppendixC. For the examples shown in the appendix, both of the cases would result in a pruned tree in order to reduce the cost by reducing the number of terminal nodes.

Deciding that no pruning is necessary in this case, makes it possible to move on to look at the features selected for the classication. 30 features are selected for the feature set. A full list of these features can be seen in AppendixCalong with visual representations of all of them. Here the three most important features are shown in order to give an idea of how the values are for the features in the sound les from the dierent environments. These can bes seen in Figure 7.3 and Figure7.4.

Figure 7.3: Feature used for the rst split

(a) Feature used for splitting the left child node from the rst split

(b) Feature used for splitting the right child node from the rst split Figure 7.4: Feature number 2 and 3

From the three most important features a clear dierence can be seen between the car environment and the other environments. When looking at the feature set it becomes clear that the rst three features all are spectral features. These are used for the rst splits and using these thus reduces the impurity of the rst nodes the most. The dataset is designed such that the speaker sources are the same in all les, the noises sources are somewhat the same depending on the environment and the biggest dierence in the sound les are the environments themselves (in form of the impulse responses used). One of the big dierences between especially the car environment compared with the other environments is the reverberation time of this environment. For a car, the reverberation time is very short compared to any other of the included environments. A car is a small environment with many dierent interior materials. But since most of the noises in a car does not have directional cues for the human ear (mostly low frequency noise occurs in a car) they tend to not cause many reections that a human ear would catch. The reverberation time is therefore an important factor in dierentiating between a car environment and the other environments, and since an environment with short reverberation time has a lower signal energy than an environment with a long reverberation time (because of the tail in the signal), spectral features can be used to distinguish between these situations.

From the rst two features, it seems that a dierence especially can be seen in the frequency bands 0-250 Hz and 1000-4000 Hz.

The Mel-frequency spectra and the MFCCs seem to provide some important information as well (feature number 4-6). These measures, along with the zero-crossing rate (feature number 7), are often used in speech/music/noise classi-cation. Since the noise sources in most of the environments contain speech, music and pure noise signals, it makes good sense that the given features are important for the classication. The car environment includes noise sources that are not included in any of the other environments. This is in particular the low-frequency noise from inside a moving car. Even though many dierent kinds of noise sources are placed in all the environments, those in the car stand out enough for the features to catch the dierences and use this in the classifying process.

In general, the features used for classifying the car environment from the other environments revolve around certain measures. These are summarised in Table 7.7.

In document Classi cation of Sound Environments for Hearing Aid Applications (Sider 84-90)