Clustering - Aalborg Universitet Predicting Performance Degradation of Fuel Cells in Backup Pow

The stacks in the dataset experience different operating conditions and have different evolutions of their system parameters. The evolution might be similar for some subset of the stacks, while different for other stacks. These difference might separate the stacks in distinct groups of similar behavior, which can be detected through clustering. Clustering is a machine learning methodology of dividing a dataset into groups of data points with high degree of similarity.

In this case grouping the stacks based on the similarities in the time series described in the previous section.

However, it is difficult to apply clustering methods on the time series directly.

Therefore, a similarity measure between the various time series should be applied to obtain a representation of the stacks that are more easily compared. A simple way of measuring similarity of two time series is to calculate the Euclidian distance between each data point in the series. But this approach is only well suited to series of same length, that experience the same trends at the same time instances, which is far from the case of the dataset in question.

E.3.1 Dynamic Time Warping

Dynamic time warping (DTW) is a distance measure, capable of extracting similarities between asynchronous time series of varying length [19]. It seeks to optimize the alignment of the two series by nonlinear warping of the time dimension before calculating the distances between the points in the series.

The DTW algorithm is commonly used in fields such as speech recognition [20]

and data mining [21], but has also found applications in health prognostics of engineering systems, such as batteries [22].

Suppose two time seriesQandC of lengthnandm, respectively. To find the DTW alignment, we first construct ann×mmatrix where the entry (i, j) is the distance between thei^th andj^th element ofQandC, i.e. q_i andc_j. The algorithm then seeks to find the path through the matrix, which minimizes the cost of the path, i.e. the sum of distances tracked by the path. Some constraints are defined when finding this path: the path starts at the beginning of both the series, and ends at the ends of both series - the warped series are aligned both at the beginning and the end; every point in each series should be matched to at least one point in the other series; the path must be monotonically increasing.

[19]

Calculating the DTW distances between the times series of each of the stacks in the dataset, gives a measure of similarity between each stack and every other stack in the dataset. The result is visualized in Fig. E.3. A low DTW distance represents high similarity - see the diagonal in the matrix plot, that shows that each stack has zero DTW distance to it self. On the other hand high DTW distance implies low similarity.

These similarities will be the basis of the clustering analysis in section E.3.3.

But firstly, the high dimensionality of this dataset (one dimension per stack) is

1 26 53 88 113 138

Fig. E.3: Matrix plot of the dynamic time warping distances.

reduced in the following section. The reason for dimensionality reduction is to enable easier inspection and visualization of the dataset.

E.3.2 Principal Component Analysis

Principal component analysis (PCA) is a dimensionality reduction method, which transforms a dataset into a space in which the variables are linearly uncorrelated. The transformed variables are referred to as principal components, which are arranged so that the first principal component has the largest possible variance, the second component has the second largest variance and so forth.

That is, the first principal components contain the most information from the original dataset. Hence, by selecting a subset of the first principal components, the dimensionality of the dataset can be reduced while maintaining as much variance, and thereby information, as possible.

The PCA algorithm involves calculating the covariance matrix (Σ) of the normalized dataset (X):

Σ= 1

n−1X^TX (E.1)

wherenis the number of samples per feature.

The eigenvalues (λ) and eigenvectors (ν) of the covariance matrix are obtained by solving the equation

N⁻¹ΣN =Λ (E.2)

whereN is a matrix comprised of the eigenvectors andΛis a diagonal matrix of eigenvalues.

The transformation matrix (M) is then comprised of the firstLeigenvectors ofN. Lcan be determined by choosing an acceptable level of explained variance.

1 2 3 4 5 6 7 8 9 10

Fig. E.4: Explained variance (information) of the first ten principal components of the dynamic time warping distances.

This can be done through the proportion of variance (P oV), which is calculated from the eigenvalues as:

wherepis the total number of principal components. The transformed dataset is finally calculated by

T =M X (E.4)

Applying PCA on the DTW matrix, as is visualized in Fig. E.3, the proportion of explained variance of the first ten principal components are shown in Fig. E.4. The figure shows that the first principal component explains more than half of the variance of the original DTW matrix. The second component explains roughly 15%. Four principal components are included in the transformation matrix, which will represent just over 90% of the variance of the original data, while reducing the dimensionality considerably.

E.3.3 DBSCAN

To detect groups within the transformed dataset, obtained through principal component analysis on the dynamic time warping distance matrix, the DBSCAN (density-based spatial clustering of applications with noise) algorithm is used [23]. DBSCAN looks at the number of neighbors to a given point within a certain distance () to determine if the point is part of a cluster or a noise point. The second parameter of the model,M inP tsis the minimum number of

Fig. E.5: Scatter matrix of the transformed and normalized dynamic time warping distances.

The clusters, as found by the K-Means method, are indicated with the different colors. The diagonal plots show the histogram of the given principal component. Each tick on the histogram y-axes represent ten points.

points that should be reachable from the point in question, for this point to be considered as a member of a cluster.

The DBSCAN algorithm classifies points in three categories: core points, border points, and noise points. A given point, p, is considered a core point if at leastM inP ts points are withindistance ofp. Ifpis not a core point, but there is another core point withindistance ofp, it is labeled as a border point.

Noise points are neither core nor border points and represent outliers in the dataset. Each core and border point are assigned to a cluster by exhaustively searching each unlabeled point. Any point within distance of a core point is appended to the same cluster.

The advantages of DBSCAN are that the algorithm can detect any number of clusters of any shape, as well as detect and ignore outliers. The drawbacks of the algorithm is mainly its sensitivity to the selection of the parametersand M inP ts.

Applying the DBSCAN algorithm to the four-dimensional PCA transformed dataset of DTW distances between fuel cell stacks, gives the clusters as shown in Fig. E.5. The parameters of the algorithm were set to M inP ts= 25 and = 0.85. Two clusters were found: cluster 1 consisting of 67 sacks; and cluster 2 of 33 stacks. Also 26 noise points were detected.

The two clusters are mainly separated in one of the four dimensions, i.e.

the first principal component, and are more intertwined in the other three dimensions.

In document Aalborg Universitet Predicting Performance Degradation of Fuel Cells in Backup Power Systems Heindorf Sønderskov, Simon (Sider 160-164)