• Ingen resultater fundet

Principal Component Analysis of Cepstral Coe- Coe-cientsCoe-cients

In document IMM, Denmarks Technical University (Sider 54-58)

x All –pole filter

3.10 Principal Component Analysis of Cepstral Coe- Coe-cientsCoe-cients

As dierent sets of cepstral coecients have been derived, it is of interest to conduct a preliminary test on each feature set's ability to separate speakers. This is done by implementing a Principal Component Analysis (PCA) [15] for the 12MFCC+12∆MFCC, 12LPCC+12∆LPCC, 12 warped LPCC+12∆warped LPCC and 13PLPCC+13∆PLPCC feature sets. The orders of these feature extraction methods are chosen because they are commonly used for speech processing applications [1]. The feature sets are extracted from training sentence a for Speaker 1 and Speaker 2, both women. The choice of the same training sentence ensures that the speech uttered is identical for both speakers and so it is each speaker's physiological characteristics that are modelled by the system based feature sets that are the only source of dierence between the sentences. This allows an analysis that can highlight which feature sets separate speakers eectively.

The PCA results in the projection of the data in the feature matrices in the directions that provide most variance. The projection of data in the direction of the rst two principal components is shown in Figure 3.17. It is desirable that the variance between two speakers ensures a good separation of the two speech signals by being larger than the variance within a speaker's feature set.

The MFCC feature set seems to have a lot of overlap between the two speakers, while the PCA on the LPCC yields a cluster of overlapping data points, but also two groups of data that exclusively belong to one or the other speaker. The warped LPCC and PLP coecients result in data points that are grouped in less dense clusters than those for MFCC and are thus subject to a lesser of overlap between dierent speaker data.

To establish whether the fact that a frame of speech is voiced or unvoiced has an eect on the separation of dierent speakers for feature sets, another PCA analysis is implemented.

This is of interest as the possible reduction of a feature set while preserving the majority of speaker-dependent information would signicantly improve the speed and performance of a SID system. The rst and simplest step in this direction is to use the results gained from the autocorrelation with center clipping algorithm to divide each feature set into a voiced feature set and an unvoiced one. The voiced/unvoiced decisions are made for frames that are 30ms in length, with an overlap of 10ms. The number of voiced frames in each case is roughly 4 times that of unvoiced frames, hence the dierence in the number of frames used in each analysis. The number of frames is divided as shown in Table 3.3.

All numbers are for frames of 10ms.

Speech signal Total Voiced Unvoiced Speaker 1,sentence a 1328 1076 252 Speaker 2, sentence a 1118 864 254

Table 3.3: Number of voiced and unvoiced frames in training sentence a

−40 −30 −20 −10 0 10 20 30 40

Speaker 1 and 2 projected on the first 2 principal components, 12dMFCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, 12dLPCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, 12d warped LPCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, 13dPLP

1st Principal component

2nd Principal component

Speaker 1 Speaker 2

Figure 3.17: PCA on 12∆MFCC, 12∆LPCC, 12∆warped LPCC and 13 ∆PLPCC

Figure 3.18 shows the results of PCA on the voiced frames of sentenceafrom Speakers 1 and 2, while the corresponding results for the unvoiced frames are presented in Figure 3.19.

−40 −30 −20 −10 0 10 20 30 40

Speaker 1 and 2 projected on the first 3 principal components, voiced 12dMFCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, voiced 12dLPCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, voiced 12 d warped LPCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, voiced 13dPLP

1st Principal component

2nd Principal component

Speaker 1 Speaker 2

Figure 3.18: PCA on Voiced frames of 12∆MFCC, 12∆LPCC, 12∆ warped LPCC and 13∆PLPCC

−30 −20 −10 0 10 20 30 40

Speaker 1 and 2 projected on the first 3 principal components, unvoiced 12dMFCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, unvoiced 12dLPCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, unvoiced 12 d warped LPCC

1st Principal component

Speaker 1 and 2 projected on the first 2 principal components, unvoiced 13dPLP

1st Principal component

2nd Principal component

Speaker 1 Speaker 2

Figure 3.19: PCA on Unvoiced frames 12∆MFCC, ∆12LPCC, 12∆ warped LPCC and 13∆PLPCC

In Figure 3.18 the separation of features in the directions of most variance for voiced frames do not show any general improvement compared to the results for all frames shown in Figure 3.17, though a few changes are visible. There is more overlap between data points using the LPCC feature extraction method and less overlap for the warped LPCC fea-tures. The plots for the MFCC features and PLPCC features remain almost unchanged.

The corresponding analysis for the unvoiced frames that is shown in Figure 3.19 does not lead to good separation of the data from Speakers 1 and 2. Although the reduced amount of data in these sets make it supercially seem like there is less overlap, there is no clear division of the points into two groups for any of the feature sets and so there is a high degree of overlap here.

Regardless of these early observations, all feature sets will be used in the trials that are executed in Chapter 9, as by using dierent classiers the distribution of data in feature space may prove suitable for speaker identication depending on the classication method applied. This preliminary analysis may, however, prove useful in understanding some of the results that will be recorded at a later stage.

In document IMM, Denmarks Technical University (Sider 54-58)