Matlab Function classregtree - Classifying Algorithm

5.2 Classifying Algorithm

5.2.2 Matlab Function classregtree

A Matlab function, classregtree, is a part of the Statistics toolbox in Matlab, and with this function, a classication tree is build. The entire function with all its posibilities is inspired by the theory presented in [8]. In the function, input is given asX, an n-by-m matrix of predictor values, andy, a vector of n response values as a function of the predictors. When the function is run, output is given as t, a binary tree where each branching node is split based on the values of a column ofX, see equation5.10for notation.

t=classregtree(X, y) (5.10)

Figure 5.3: Pattern classication viewed as mapping a feature space,V, into a decision space,Q. The division of the square can be represented by the given classication tree. Here for a two-dimensional feature space. With inspiration from [8].

At the expense of more computation, the training set can be divided into a number of cross-validation samples,v, and then one sample can be used to test the performance of the classier on the remaining (v−1) samples and nally average the v such estimates. When choosing the number of cross-validations, it is important to notice that in [8] they have tested the adequate number of partitions of the sample, v. Adequate accuracy was gained with a 10-fold cross-validation, v= 10. In some cases, smaller values of v has given adequate accuracy, but no situations have ever implied that taking a value of v larger than 10 would signicantly improve the accuracy for the selected tree. Therefore v= 10in this framework to ensure an adequate accuracy in the cross-validation without it being to computationally time consuming with a large dataset.

In Matlab the test function can be specied to either test or cross-validation.

For the training set, the test version is calculated. This results in a cost vec-tor, the standard error of each cost value, a vector containing the number of terminal nodes for each subtree, and a scalar containing the estimated best

level of pruning. When cross-validation is chosen, the function uses the 10-fold cross-validation to compute the cost vector by partitioning the sample into 10 subsamples, chosen randomly with roughly equal size and class proportions. For each subsample, the function ts a tree to the remaining data and uses it to predict the subsample. It pools the information from all subsamples to compute the cost for the whole sample. The same values are generated for the test set as for the training set, but using the best level from the cross-validation of the training set to decide the point of the smallest tree within 1 standard error of the minimum cost tree.

The best level of pruning, derived from the cross-validation, can then be used to specify the pruning level in the prune function. This function takes the classication tree of input and prunes it to a specied level. In this work, the best level of pruning is used. If this level is 0, no pruning occurs. Classication trees are pruned by rst pruning the branches giving less improvement in error cost.

Description of the Classication System

The sequence of the classication framework, used in this work, is described in this chapter along with the performance measures that are used to evaluate the classications. The description is supported by a owchart very shortly presenting the steps of the framework in order to clarify the steps along with their input and output.

6.1 Classication Framework

The basic outline is run from the function run_test. This le starts by run-ning the function matlabsetup which is used to add and remove Matlab and Java paths for the current workspace and to check for specic built in Mat-lab toolbox versions. The run_test le is specied for each new test bench in order to perform the correct steps for that certain test bench. All of the following functions and les will automatically run when the run_test func-tion is executed. Then a framework of generating sounds, extracting features, generating classier data and running classier is initiated with the function generate_sounds. This function calls the m-les in the sound_specs folder.

These les are created as functions to call the acoustic simulator tool and create output structures that can be used further on to generate sound les in .wav format. These sounds are then used to extract features using the openSMILE toolbox in generate_features. The feature extraction is based on the cong-ure le chosen from the setup in the toolbox. First of all it is tested using the

emo_large.conf le which extracts 6669 features. The features extracted are saved in .csv format to further use in the generation of the classication data.

This data is used in generate_classifier_data to make a .mat le containing all samples, the features calculated for each sample and a label marking if the le is from a car environment or not. If the feature le begins with "car_" a vector is generated containing ones in the same number of rows as the feature le contains otherwise a row of zeros is created. This is used further on to load all data and create a test and training split in generate_classifier_input.

This split is used in run_classifier to create a set of training data and test data and from this using the classregtree function dene which of the features that are relevant for a correct classication of the given test data set. Here a cross-validation is also performed in order to nd the best combination of fea-tures for a potential pruning of the classication tree. From this a confusion matrix is calculated in analyze_data to nd the correct and false classication rate for the two possible classes. Finally all the features used for classication are extracted and plotted using run_feature_extraction. A visualisation of the sequence of the framework can be seen in Figure6.1.

For some of the tests, the framework is altered a bit. This goes for example for the tests where one single dataset is being created (the last ones of the conducted tests, see Section7.2). In these cases, the framework works the same way until the function generate_classifier_data where a .mat le is created, but all the functions up till this point is run both for a predened training and test set. From here on the function generate_classifier_input_single_dataset is used to create a single input le for the run_classifier function where the split into training and test data was decided already before any of the functions are run. The predetermined split is still used in the classregtree function to dene which features that are relevant for a correct classication of the given test dataset. A possible pruning is still of interest in the single dataset tests along with the confusion matrix and extraction and plots of the relevant features. In a nal test the features from the pruning are the only ones in focus, so here feature extraction is only based on these features. There is no point in pruning the resulting classication tree from this test further why pruning is disabled in this test and no cross-validation is performed.

Note the acoustic simulator tool can only be used when the computer is con-nected to the network at Oticon. If the user is not concon-nected to this net-work, only sounds that have already been generated can be used. In this sit-uation, the entire run_test should not be executed, but a sound_les folder should already be available containing a number of .wav les and then the rest of the functions can be run one by one from generate_features until run_feature_extraction.

Figure 6.1: The sequence of the classication framework going from generation of sounds over feature extraction to the classication of the sounds and further analysis of the classied signals.

In document Classi cation of Sound Environments for Hearing Aid Applications (Sider 70-75)