Unsupervised anomaly detection in shearers via autoencoder networks and multi-scale correlation matrix reconstruction

Yang Song; Weidong Wang; Yuxin Wu; Yuhan Fan; Xuan Zhao

doi:10.1007/s40789-024-00730-9

Unsupervised anomaly detection in shearers via autoencoder networks and multi-scale correlation matrix reconstruction

Download PDF

Research Article

Open Access

Published: 19 October 2024

0 Accesses

International Journal of Coal Science & Technology Volume 11, article number 79, (2024)

Yang Song ¹ ,
Weidong Wang ^1,2 ,
Yuxin Wu ¹ ,
Yuhan Fan ¹ ,
Xuan Zhao ¹

1.

China University of Mining & Technology-Beijing, Beijing, China
2.

School of Chemical & Environmental Engineering, China University of Mining & Technology-Beijing, Beijing, China

Abstract

As the main equipment of coal mining production, the anomaly detection of shearer is important to ensure production efficiency and coal mine safety. One key challenge lies in the limited or even absence of labeled monitoring data for the equipment, coupled with the high costs associated with manual annotation. Another challenge stems from the complex structure of the mining machines, making it difficult to reflect the overall operational state through local anomaly detection. Consequently, the application of decoupled local anomaly detection for mining machines in practical production remains challenging. This paper presents an unsupervised learning-based method for detecting anomalies in shearer. The method includes a module for constructing a Multi-scale Correlation Matrix (MSCM) of mining machine operating conditions, as well as the CNN-ConvLSTM Autoencoder (C-CLA) network. The module for constructing an MSCM enhances the representation of interrelationships between various features of the equipment from different perspectives using multiple correlation analysis methods. The C-CLA network integrates convolutional and convolutional recurrent neural networks, with the convolutional structure extracting local spatial features and the ConvLSTM structure further capturing information from different time scales and feature scales, thereby enhancing the model’s perceptual capabilities towards changes in equipment status. Finally, shearer anomaly detection is achieved through the analysis of reconstructed residual matrices. The rationality and practicality of the proposed method have been validated on our dataset, and the model’s generalization capability has been verified through repeated experiments in similar scenarios. However, due to variations in the working environment of different mining faces and differences in equipment models, implementing detection on other mining faces often requires retraining the model with new data. Furthermore, we compared our method with other anomaly detection techniques, and our detection efficiency was superior by approximately 3%. This method effectively detects anomalies in the shearer.

1.Introduction

Coal is one of the world’s primary resources, and the shearer, as key equipment for coal mining, significantly influences mining efficiency and coal mine safety. The absence of early anomaly detection can lead to catastrophic consequences. Therefore, achieving anomaly detection in shearer is of paramount importance.

Existing anomaly detection methods can be classified into two main categories: model-driven and data-driven approaches. Model-driven methods involve analyzing mathematical equations that describe the physical processes in the real world. These methods capture the intrinsic mechanisms and features of the system to describe its dynamic behavior. Frank et al. have provided a detailed description of various types of analytical model-based methods (Frank et al. 2000). Venkat Venkatasubramanian et al. have compared quantitative model-based methods, qualitative model-based methods, and process history-based methods (Venkatasubramanian et al. 2003a, b, c). Model-driven approaches may exhibit higher accuracy and interpretability in specific application scenarios. However, such methods require expert knowledge and prior assumptions. Due to the complex and nonlinear nature of the relationships between the states of mechanical equipment during operation, caused by factors like noise and degradation, it is challenging to establish precise physical and mathematical models.

Data-driven methods are based on learning and mining historical data to approximate the underlying mapping relationships within the data, while adhering to specific cost function constraints, resulting in corresponding mathematical models. Data-driven methods can be classified into three major categories: statistical analysis-based methods, signal processing-based methods, and deep learning-based methods. Statistical analysis-based methods involve evaluating the current state based on a statistical analysis of historical process data. Representative methods include principal component analysis, independent component analysis, partial least squares method, and non-negative matrix factorization. Fault detection techniques based on signal processing primarily include wavelet transform, spectral analysis, and empirical mode decomposition (Qiao and Lu 2015). Liu et al. have extended the application of wavelet transform in the field of fault diagnosis (Xiaozhi et al. 2019; Kumar and Upadhyaya 2023; Ge et al. 2015; He et al. 2018). The spectral analysis method correlates the results of the modal analysis with known spectra, where different types of anomalies induce distinct spectral characteristics in the monitoring signals. The empirical mode decomposition method, proposed by Huang et al. (Huang et al. 1998), demonstrates significant advantages and a high signal-to-noise ratio when handling nonlinear and non-stationary signal sequences. With the advancement of artificial intelligence, deep learning-based methods have found numerous applications in the field of machinery fault detection. Supervised learning networks such as convolutional neural networks (Guo et al. 2022; Jia et al. 2023; Ruan et al. 2023; Yang et al. 2022b; Xu et al. 2022), recurrent neural networks (Jalayer et al. 2021; Liu et al. 2021a, b; Shi et al. 2022; Gompel et al. 2022; Zhang et al. 2021), and transformers (Ding et al. 2022; Wu et al. 2022) achieve high detection accuracy and generalization capability by leveraging labeled data. On the other hand, unsupervised learning approaches based on autoencoder networks and generative adversarial networks exploit the latent information in the data to achieve equipment anomaly detection without the need for annotated information.

In the context of shearer anomaly detection, several practical issues pose challenges to the existing research. The complex structure of the shearer, with its interdependent components, makes it difficult to capture the overall abnormal state of the equipment through localized anomaly detection alone, which may lead to masking or misjudgment of the equipment’s overall anomalies. Furthermore, due to the harsh and complex operating conditions of the shearer, obtaining labeled monitoring data is often scarce or even unattainable. The requirement for expert knowledge and human resources for manual annotation is also difficult to meet, rendering traditional supervised learning algorithms inadequate for practical applications. Therefore, it is crucial to develop an unsupervised approach that can effectively mine data information for shearer anomaly detection. Unlike traditional supervised learning algorithms, unsupervised learning does not rely heavily on labeled data; instead, it leverages the exploration of relationships within the data to learn its inherent features and patterns, thus enabling anomaly detection. To address these challenges, this study proposes an unsupervised learning-based method for shearer anomaly detection. The main contributions of this study are as follows:

(1) By employing various methods to quantify the interrelationships among the multidimensional features of the shearer, we construct a Multi-Scale Correlation Matrix (MSCM) of the shearer’s operating conditions. This matrix captures the correlations between different features from different perspectives, collectively reflecting the state information of the shearer.

(2) Leveraging the architecture of an autoencoder network, we combine convolutional recurrent neural networks and convolutional structures to construct the CNN-ConvLSTM Autoencoder (C-CLA) network. network. The ConvLSTM structure in the encoder part captures information from different time scales and feature scales, while the convolutional structure focuses on extracting local features. Through the decoder, the shearer’s operating condition features are reconstructed, enabling shearer anomaly detection by analyzing the reconstruction error matrix.

2.Related work

In recent years, deep learning has emerged as a prominent and leading direction in the field of artificial intelligence, achieving significant success in numerous tasks, particularly in supervised learning where it has made groundbreaking advancements. Supervised learning has been applied to machinery anomaly detection, relying on extensive labeled data to train models and establish connections between existing data and labels. This enables the identification of abnormal states in new data through an end-to-end approach.

In the realm of machinery anomaly detection, it is possible to transform vibration and sound signals into image or video data. By employing convolutional neural networks (CNN), which excel in image processing, one can perform anomaly classification and acquire the characteristics of complex data structures. This approach offers high accuracy and generalization capabilities. Lu et al. demonstrated the conversion of multidimensional time series into images representing healthy states, allowing deep convolutional networks to identify anomalous conditions (Lu et al. 2020). Similarly, in the aviation domain, CNN and their variants have been utilized for detecting anomalies in equipment or monitoring data (Du et al. 2022; Song et al. 2022). By extracting features from signals such as rotational device vibrations and accelerations using convolutional networks, combined with equipment operating conditions, motor faults can be detected (Junior et al. 2022; Khan et al. 2020). The recurrent neural network is adept at handling time series data, such as sensor measurements of vibrations, temperature, and sound signals. By employing forward and backward propagation, RNNs capture the dependencies between data points, extracting valuable features from historical data for anomaly detection (Zhu et al. 2022a, b). Zhang et al. transformed one-dimensional time series vibration signals into two-dimensional images and utilized the Gated Recurrent Unit to learn representative features from the constructed images, enabling fault recognition (Zhang et al. 2021). Qiao et al. utilized CNN to extract spatial features and combined them with Long Short-Term Memory to extract temporal features, successfully diagnosing rolling bearing faults using publicly available datasets (Qiao et al. 2020). The Transformer architecture and its various adaptations exhibit enhanced parallel computing capabilities, circumventing the issue of sequential computation. Furthermore, by leveraging attention mechanisms, Transformers can effectively capture global information from input sequences, considering all elements (Nath et al. 2022; Li et al. 2022; Yang et al. 2022a).

Unsupervised learning tackles various challenges in anomaly detection by utilizing training samples with unknown categories. The data provided to the algorithm possesses rich internal structures, allowing for anomaly detection through learning the inherent distribution of the data itself. Additionally, it is capable of effectively detecting anomalies of unknown types. The application of unsupervised learning in the domain of mechanical equipment anomaly detection can be categorized into several approaches, including clustering-based methods, generative adversarial network-based methods, and autoencoder-based methods, among others. Clustering-based methods involve grouping similar data points into the same category without the need for predefined classes. This approach is typically implemented using algorithms such as k-means and spectral clustering. It is suitable for scenarios with large volumes of data and multiple features, enabling the identification of various abnormal states in the equipment (Yiakopoulos et al. 2011; Zhu et al. 2022a, b). The approach based on generative adversarial networks involves the interplay between a generator and a discriminator. The generator aims to produce highly realistic but slightly different anomalous samples, while the discriminator is trained using both generated and real anomalous samples to distinguish between real and generated data. Through iterative training, the generator gradually improves its ability to generate more authentic samples, while the discriminator becomes increasingly accurate in identifying anomalous data (Chen et al. 2022; Liang et al. 2023; Liu et al. 2021a, b; Xiao et al. 2022). The method based on autoencoders involves compressing and decompressing data using an autoencoder to capture essential features and eliminate factors with minimal influence, thus obtaining a simplified representation of the core data. This approach is suitable for handling large datasets with high feature dimensions and incomplete labels. It effectively extracts the primary features of the data, thereby enhancing the accuracy of anomaly detection (Zheng and Zhao 2020).

3.Method

In this section, we initially outline the problem we aim to address and introduce the proposed method for constructing the MSCM of the shearer. Subsequently, we expound upon the structure of the C-CLA data reconstruction network based on autoencoders.

3.1 Problem statement

Given a segment of shearer operating data, denoted as $X={({X}_{1},\cdots\:,{X}_{n})}^{T}\in\:{\mathbb{R}}^{n\times{T}}$, with a total length of T and comprising n features, our objective is to identify whether this segment of data is anomalous and analyze which factors have the greatest influence on the anomalous state. We aim to achieve anomaly detection.

3.2 Construction method of MSCM

There exists a correlation among the multidimensional operational features of the shearer. The features such as current, temperature, and oil temperature exhibit linear correlations, while the abnormal state of the machine may also result in nonlinear correlations among the multidimensional features. Furthermore, these features are temporally correlated. Previous studies have demonstrated that sensors at different measurement points of the machine provide varying dimensions of information. By comprehensively analyzing this feature information, the machine’s state can be reflected. The correlation among features of different dimensions plays a crucial role in characterizing the machine’s state.

Specifically, given any two feature vectors of the coal-cutting machine with a length of W, denoted as ${X}_{i}=({X}_{i1},\cdots\:{X}_{ik},\cdots\:{X}_{iW})$ and ${X}_{j}=({X}_{j1},\cdots\:{X}_{jk},\cdots\:{X}_{jW})$.

The Pearson correlation coefficient is a measure of linear correlation between vectors, with an output range of [-1, 1], where 0 indicates no linear correlation. The Pearson correlation coefficient performs a centering operation on the values of the vectors, which involves subtracting the mean value from each element of the two vectors. After centering, the mean value of all dimensions is approximately 0. Then, the cosine distance is computed based on the centered results. By calculating the Pearson correlation coefficient between pairwise multivariate time series features, we can construct a matrix of Pearson correlation coefficients. The specific method is as follows:

$$\begin{array}{c}\text{Pearson}=\frac{{\sum}_{k=1}^{W}\left({x}_{ik}-\bar{{x}_{i}}\right)\left({x}_{jk}-\bar{{x}_{j}}\right)}{\sqrt{{\sum}_{k=1}^{W}{\left({x}_{ik}-\bar{{x}_{i}}\right)}^{2}}\sqrt{{\sum}_{k=1}^{W}{\left({x}_{jk}-\bar{{x}_{j}}\right)}^{2}}}\end{array}$$

(1)

Where ${x}_{ik}$ represents the k_th element of the i_th feature, W denotes the sequence length, and $\bar{{x}_{i}}$ denotes the mean value of the i_th feature sequence.

The Spearman correlation coefficient is a non-parametric method that measures the rank correlation between two variables based on their rank orders in the dataset, indicating whether their relative orders are similar. It is less influenced by extreme values and can capture non-monotonic relationships between features. Therefore, the Spearman correlation coefficient can assess non-linear correlations between features and is more robust to datasets containing outliers. By calculating the Spearman correlation coefficient between pairwise vectors of multivariate time series features, a Spearman correlation matrix can be constructed. The specific method is as follows: The Spearman correlation coefficient is a non-parametric method that measures the rank correlation between two variables based on their rank orders in the dataset, indicating whether their relative orders are similar. It is less influenced by extreme values and can capture non-monotonic relationships between features. Therefore, the Spearman correlation coefficient can assess non-linear correlations between features and is more robust to datasets containing outliers. By calculating the Spearman correlation coefficient between pairwise vectors of multivariate time series features, a Spearman correlation matrix can be constructed. The specific method is as follows:

$$\:\begin{array}{*{20}{c}}\text{Spearman} {= \frac{{\frac{1}{n}\sum \: _{k = 1}^W\left( {R\left( {{x_{ik}}} \right) - \overline {R\left( {{x_i}} \right)} } \right)\left( {R\left( {{x_{jk}}} \right) - \overline {R\left( {{x_j}} \right)} } \right)}}{{\sqrt {\frac{1}{n}\sum \: _{k = 1}^W{{\left( {R\left( {{x_{ik}}} \right) - \overline {R\left( {{x_i}} \right)} } \right)}^2}} \sqrt {\frac{1}{n}\sum \: _{k = 1}^W{{\left( {R\left( {{x_{jk}}} \right) - \overline {R\left( {{x_j}} \right)} } \right)}^2}} }}}\end{array}$$

(2)

Whereas, $R\left({x}_{ik}\right)$ represents the rank position of the k_th data in the ${x}_{i}$ feature vector, and $\:\overline {R\left( {{x_i}} \right)}$ denotes the average rank position of the ${x}_{i}$ feature vector.

Temporal cross-correlation quantifies the relationship between feature vectors of two-time series at each time step. The operational condition data of the coal mining machine belongs to temporal data. By computing the inner product of the two-time series vectors at each time step, we obtain a measure of correlation that encompasses the temporal dependencies between time steps. The temporal cross-correlation matrix is constructed by calculating the pairwise cross-correlation between the multidimensional feature vectors. It can be represented as follows:

$$\begin{array}{c}{M}_\text{TC}=\frac{\sum_{k}^{W}{x}_{ik}{x}_{jk}}{w}\end{array}$$

(3)

Where W represents the length of the sequence, and w denotes the scaling factor, typically set as w = W by default.

We partition the multidimensional time-series condition matrix, $X={({X}_{1},\cdots\:,{X}_{n})}^{T}\in\:{\mathbb{R}}^{n\times{T}}$, with a length of T, into multiple consecutive subsequences of length W. For each subsequence, we employ distinct independent methods of correlation to construct a multiscale correlation matrix. This approach allows us to leverage the strengths and overcome the limitations of various correlation measures, enabling a deeper investigation into the evolutionary patterns and dynamic characteristics of multivariate time-series data, thus enhancing the accuracy of multivariate time-series data analysis. The construction method of the MSCM is shown in Fig. 1. The matrix portion in the figure is presented as a heatmap, where brighter colors indicate stronger correlations between different dimensional features, while darker colors represent weaker correlations. The correlation matrix depicted by the heatmap highlights the structure of our data matrix.

3.3 C-CLA network

The model framework is based on an autoencoder structure, which aims to uncover the latent structure of the data and remove factors with minimal influence, thereby discovering a simplified representation of the core data. The encoder component compresses the source information of large magnitudes into a latent feature space, while the decoder reconstructs the features. By comparing the reconstruction error, the accuracy of the autoencoder network is gradually improved. This process does not require labeled data. Compared to the source data, the encoder significantly reduces the dimensionality of the extracted features, reducing the burden of subsequent neural network training while achieving excellent training results. At this stage, the output of the encoder is passed on to the next stage for processing.

Convolutional neural networks have demonstrated excellent performance in various domains, particularly in image-related tasks such as image classification, image semantic segmentation, image retrieval, and object detection, making them well-suited for handling spatial data. In this paper, we employ a convolutional neural network to process multiscale correlation matrices. By utilizing convolutional layers, the network can preserve the spatial continuity of the data and extract local features. CNN provides a hierarchical representation of the data, allowing the original signals to be processed layer by layer, gradually recognizing parts and wholes.

Recurrent neural networks are widely used in the field of sequential data processing, and the operational data of coal mining machines naturally possess temporal characteristics. The traditional LSTM architecture addresses the issue of vanishing or exploding gradients in traditional RNNs by introducing a “cell” unit. The LSTM cell consists of two different types of state information: the long-term state and the short-term memory. Between two adjacent time steps of the operational data, less important information is forgotten in the long-term state, while filtered short-term memory information is incorporated. These two components are then combined and fed into subsequent steps. The long-term state of the LSTM cell can be thought of as a “highway,” with the cell’s state resembling the flow of traffic on that highway. Small linear operations, akin to vehicles leaving or merging into the traffic flow, have minimal impact on the information passing through the entire highway. Even the rock pressure information from earlier time steps can be carried over to later time steps, overcoming the limitations of short-term memory. The LSTM architecture employs three gates: the input gate, output gate, and forget gate, which protects and controls the state of the cell. Traditional LSTM utilizes Sigmoid neural network layers and dot product operations, with values between 0 and 1 representing the amount of information that can pass through each gate. In contrast, ConvLSTM leverages the advantages of convolutional operations, such as parameter sharing and translational invariance, from convolutional neural networks to capture spatial correlations in the data. This enables the effective processing of spatiotemporal sequential data, with information propagation between the hidden state and the cell state. The ConvLSTM formulation is as follows:

$$\begin{array}{c}{I}_{t}=\sigma\:\left({X}_{t}*{W}_{xi}+{H}_{t-1}*{W}_{hi}+{W}_{ci}{C}_{t-1}+{b}_{i}\right)\end{array}$$

(4)

$$\begin{array}{c}{F}_{t}=\sigma\:\left({X}_{t}*{W}_{xf}+{H}_{t-1}{*W}_{hf}+{W}_{cf}{C}_{t-1}+{b}_{f}\right)\end{array}$$

(5)

$$\begin{array}{c}{O}_{t}=\sigma\:\left({X}_{t}{*W}_{xo}+{H}_{t-1}*{W}_{h0}+{W}_{co}{C}_{t-1}+{b}_{o}\right)\end{array}$$

(6)

By employing convolutional operations in place of the traditional dot product computations in LSTM, the equation can be expressed as follows: At time step t, ${{x}}_{{t}}$ represents the input signal, ${{h}}_{{t}-1}$ denotes the previous time step’s output signal, W and b are the corresponding weight parameters and biases, ${C}_{t-1}$ represents the cell state from the previous time step, and σ denotes the application of the Sigmoid function to the result.

The autoencoder network devised in this paper consists of a convolutional network and ConvLSTM. In the encoder section, the multiple temporal steps of data are individually processed at corresponding moments. At each time step, the multi-scale correlation matrices are extracted to capture local features of the model through the employment of a convolutional network. These matrices are then concatenated and sequentially fed into the ConvLSTM. The decoder section conducts deconvolution on the output of the final time step of the ConvLSTM. The resulting data is merged with the ConvLSTM data from the previous time step. This iterative process is repeated, ultimately leading to the reconstruction of the data. The structure of the C-CLA network is shown in Fig. 2.

4.Experiments

4.1 Datasets description and configuration

The original data is sourced from Yujialiang Coal Mine in Shenmu City, Shaanxi Province, China. In the initial phase, we constructed a knowledge graph for the shearer’s faults. This involved combining real-world common fault scenarios with expert knowledge. We correlated the fault outcomes with anomalies in equipment component parameters, thereby determining the various dimensional operational data required for our shearer. It comprises 20 dimensions of data from various components, including the shearer cutting part, traction part, rocker arm, and the main pump. The raw data comprises operational data spanning 54 days, with each dimension containing approximately two million data points. Due to harsh mining conditions and communication equipment failures, the quality of the original data is relatively low. To address this, various data cleaning methods are employed to optimize the data selectively. Different features may possess varying numerical scales, which could lead to the model being more sensitive to certain features during the training process. To address this, we opted for the min-max normalization technique, which maps the value ranges of different features to the same interval. This eliminates the scale discrepancies, enabling the model to learn the influences of each feature in a more balanced manner. The dataset is constructed by building multi-scale correlation coefficient matrices based on the data. A sliding window approach is utilized to extract data at each time step, with a window size (w) determined as 200 data points. (Since the sensors record data only after an actual change occurs, the uniformity of the multi-dimensional timestamps results in an interval of approximately 1 s, implying that a sliding window of length w captures data spanning approximately 200 s.) Five consecutive time steps are taken as the model’s input, with the last time step being used for reconstruction. The input format of the network for a single data instance is $5\times\:3\times\:20\times\:20$, representing approximately 1,500 healthy data instances. The dataset is divided into training, validation, and testing sets in a ratio of 6:2:2. A portion of the testing set is used for threshold selection, while the remaining testing data, along with several abnormal data instances, is collectively employed to assess the model’s performance.

4.2 Metrics and implementation details

The model selects Mean Absolute Error (MAE) as the loss function. MAE possesses differentiability, enabling the convenient utilization of optimization algorithms such as gradient descent for model training. Moreover, it intuitively represents the magnitude of the model’s error. Furthermore, for a segment of anomalous data, multiple instances of data reconstruction are performed using sliding windows to detect temporal anomalies. The evaluation metric considers the percentage of sliding windows in a continuous data segment where the predicted reconstruction error exceeds a threshold.

In this study, all models were implemented using PyTorch 1.7.1 and trained on an NVIDIA 3090 GPU. Due to the relatively small size of individual samples, under the condition of selecting a batch size of 256, on our GPU, it takes approximately 3 min to complete one batch of training. We conducted a total of 356 iterations until the loss function ceased to converge further.

Additionally, the programming language and its version were Python 3.6.13. The parameter configuration set the batch size to 256, and the optimizer used was SGD with an initial learning rate of 0.001. We employed a method of gradually reducing the learning rate in our experiments to smoothly adjust the model parameters during the training process, making it easier for the model to converge and perform well on test data. Specifically, under the initial learning rate of 0.001, we utilized an adaptive learning rate adjustment strategy. This strategy triggered a learning rate decay when the loss function failed to converge for 20 consecutive iterations or converged with a magnitude smaller than 0.0001, multiplying the current learning rate by 0.8. This strategy aimed to address potential oscillations or instability during the training process, thereby enhancing the stability and performance of the model training.

Additionally, we employed regularization technique: dropout, which involves randomly deactivating a portion of neurons during the training process to reduce model complexity and mitigate the risk of overfitting. In each training batch, for every neuron, we set its output to zero with a probability of 0.5. This prevents the model from becoming overly reliant on any single neuron, enforcing the learning of multiple independent feature representations. As a result, the model becomes more capable of adapting to various inputs and data distributions, thereby enhancing its robustness and generalization ability.

5.Experimental result and analysis

Our test set consisted of multiple consecutive segments of healthy data and three segments of abnormal data, specifically the continuous data preceding the abnormal points. The network we designed only needs to learn the distribution characteristics of healthy data. The training data consists solely of healthy data. Therefore, the anomalous data selected when constructing the test set is entirely absent from the model’s training process. To avoid the issue of large reconstruction errors caused by completely unfamiliar data, multiple segments of healthy and anomalous data that have never appeared in training were chosen for joint testing. The model’s performance on these two entirely unfamiliar datasets (one composed of healthy data and the other of anomaly data) demonstrates that the reconstruction error for anomaly data is significantly higher than that for healthy data. This ensures the model’s generalization ability. Additionally, the same approach was applied to testing on equipment from another mining face, thus demonstrating that the model can perform anomaly detection in similar scenarios. By performing anomaly detection on each segment using a sliding window approach, we calculated the reconstruction errors and evaluated comprehensive metrics such as accuracy and recall.

5.1 Ablation studies of our method

To validate the effectiveness of the proposed method for constructing MSCM, we conducted ablation experiments using the same batch of data. We generated model inputs by utilizing the correlation matrix construction method in different combinations. Initially, we trained the model on the healthy data from the training and validation sets. Then, we set the 90th percentile of the reconstruction error of the health data in the test set as the threshold. Finally, we performed the validation on another portion of the test set, which included multiple segments of healthy data and three anomalous data samples.

The experimental results, as shown in Table 1, reveal that by considering the mean reconstruction results of multiple segments of healthy data and three segments of abnormal data, it becomes evident that a single correlation matrix construction method may perform well in detecting a specific type of anomaly. However, due to the non-linear relationship of equipment abnormalities, it may fail to detect other types of anomalies. On the other hand, the utilization of the multi-scale correlation matrix construction method demonstrates the best overall performance in detecting multiple types of anomalous segments. By employing a sliding window approach for reconstructing multiple segments of abnormal data, the percentage of reconstruction errors surpassing the threshold is consistently above 30% of the entire segment length. In contrast, for healthy data, the percentage of reconstruction errors exceeding the threshold is approximately 10%. This indicates that the model successfully distinguishes between healthy and abnormal states, achieving effective anomaly detection.

Table 1 Results of ablation experiments for MSCM

	Pearson (%)	Spearman	Time correlation	P + S	P + T	S + T	P + S + T
Health data	9	11	10	9	10	10	9
Abnormal 1	55	47	100	47	65	55	65
Abnormal 2	32	29	3	35	32	26	35
Abnormal 3	33	28	0	33	33	28	33

5.2 Comparison with other models

We utilized Autoencoder networks based on classic network designs to reconstruct the healthy data and detect anomalies, comparing the results with those of our model. Initially, we selected traditional convolutional and recurrent neural networks, which focus solely on capturing spatial features or temporal dependencies of sequential data. These network models are relatively simple and have shorter training times. Additionally, we employed hybrid networks such as CNN + LSTM and ConvLSTM, which handle both spatial and temporal features of input data. From the results shown in Fig. 3, it is evident that the hybrid networks outperformed the traditional networks in anomaly detection. Our CNN component extracts low-level local features in the initial layers and progressively captures higher-level abstract features as the network deepens. The ConvLSTM component employs convolutional operations to extract features from the input data and utilizes LSTM memory cells to model temporal dependencies, particularly for the closely related spatiotemporal conditions of the mining machine data.

We compared our proposed method with existing unsupervised anomaly detection approaches to validate its effectiveness. Among them, the Donut network similarly relies on the VAE concept, the GAE-M network employs a Bi-LSTM integrated with attention mechanisms for implementation, but with differences in data representation methods compared to ours. The AMSL network combines a self-supervised learning module. Furthermore, a comparison was made with TranAD, which is based on the Transformer architecture. It’s worth noting that due to the distinct model structures and input data formats, our dataset was adjusted accordingly to accommodate each model. The final experimental results indicate that on our dataset, our method exhibits approximately a 3% higher detection efficiency than other models. The experiments demonstrate that our model is more efficient in the task of anomaly detection for shearer machines.

5.3 Performance on other datasets

By conducting anomaly detection on mining machine operating condition data from another mining face in Inner Mongolia, China, we validated the model’s generalizability. The original dataset encompasses operating condition data from various components of the mining machine, including the shearer drum, traction system, rocker arm, and main pump, collected from October 21, 2019, to April 29, 2020. In a similar scenario, we applied the same data processing method to construct a multi-scale correlation matrix dataset without altering the model structure and trained it accordingly. Using a 6:2:2 split for training, validation, and testing sets, the testing set consisted of multiple segments of healthy data and five segments of anomalous data. The results of anomaly detection for the anomalous data are illustrated in the figure below. By utilizing the 90th percentile of the reconstruction errors of the health data in the testing set as the threshold, we effectively detected multiple segments of anomalous data. The experimental results are shown in Fig. 4.

6.Conclusions

In this undertaking, motivated by the voluminous mining machine operating condition data collected by sensors and the research on unsupervised learning, we propose an unsupervised learning-based approach for anomaly detection in mining machines. This approach comprises an MSCM construction module and the C-CLA network. The different components of the mining machine’s operating condition data are used to construct correlation matrices at various scales, considering both linear and nonlinear correlations, as well as temporal correlations. The C-CLA network is constructed using healthy operating condition data from the mining machine, where multiple convolutional layers and ConvLSTM layers are stacked to progressively learn higher-level feature representations. The lower-level convolutional layers capture local features, while the higher-level ConvLSTM layers capture abstract temporal relationships, enabling the network to effectively reconstruct healthy data and identify abnormal patterns in input data when the mining machine is in an anomalous state. Analyzing reconstruction errors using different scales of correlation matrices allows for more comprehensive anomaly detection. Different scales can capture anomalies at different levels, providing a better understanding of the impact of anomalies across various scales. Due to the vast amount of unlabeled data now stored at fully mechanized mining faces, our proposed approach addresses the challenges of equipment coupling and the difficulty of obtaining fault-label data in mining machine detection. Our approach saves a significant amount of annotation costs. Effective detection of anomalous states can provide guidance for equipment maintenance, increase production efficiency, and enhance economic benefits. Its practical application in real-world scenarios holds tremendous potential.

References

[1]	Chen Q, Chen L, Li Q, Shi J, Zhu Z, Shen C (2022) A lightweight and robust model for engineering cross-domain fault diagnosis via feature fusion-based unsupervised adversarial learning. Measurement 205:112139
[2]	Ding Y, Jia M, Miao Q, Cao Y (2022) A novel time–frequency transformer based on self–attention mechanism and its application in fault diagnosis of rolling bearings. Mech Syst Signal Process 168:108616
[3]	Du X, Chen J, Zhang H, Wang J (2022) Fault Detection of Aero-Engine Sensor based on Inception-CNN. Aerospace 9:236
[4]	Frank PM, Ding S, Marcu T (2000) Model-based fault diagnosis in technical processes, vol 22. Transactions of The Institute of Measurement and Control - TRANS INST MEASURE CONTROL, pp 57–101
[5]	Ge W, Wang J, Zhou J, Wu H, Jin Q (2015) Incipient Fault Detection based on Fault extraction and residual evaluation. Ind Eng Chem Res 54:3664–3677
[6]	Guo Z, Yang M, Huang X (2022) Bearing fault diagnosis based on speed signal and CNN model. Energy Rep 8:904–913
[7]	He Z, Shardt YAW, Wang D, Hou B, Zhou H, Wang J (2018) An incipient fault detection approach via detrending and denoising. Control Eng Pract 74:1–12
[8]	Huang N, Shen Z, Long S, Wu MLC, Shih H, Zheng Q, Yen N-C, Tung C-C, Liu H (1998) The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc Royal Soc Lond Ser A: Math Phys Eng Sci 454:903–995
[9]	Jalayer M, Orsenigo C, Vercellis C (2021) Fault detection and diagnosis for rotating machinery: a model based on convolutional LSTM, fast Fourier and continuous wavelet transforms. Comput Ind 125:103378
[10]	Jia L, Chow T, Yuan Y (2023) GTFE-Net: a Gramian Time frequency enhancement CNN for Bearing Fault diagnosis. 119:105794
[11]	Junior RFR, Areias IAdS, Campos MM, Teixeira CE, da Silva LEB, Gomes GF (2022) Fault detection and diagnosis in electric motors using 1d convolutional neural networks with multi-channel vibration signals. Measurement 190:110759
[12]	Khan MA, Kim Y-H, Choo J (2020) Intelligent fault detection using raw vibration signals via dilated convolutional neural networks. J Supercomputing 76:8086–8100
[13]	Kumar HS, Upadhyaya G (2023) Fault diagnosis of rolling element bearing using continuous wavelet transform and K- nearest neighbour, Materials Today: Proceedings
[14]	Li Y, Zhou Z, Sun C, Chen X, Yan R (2022) Variational attention-based Interpretable Transformer Network for Rotary Machine Fault Diagnosis. IEEE Trans Neural Networks Learn Syst, 1–14
[15]	Liang P, Wang B, Jiang G, Li N, Zhang L (2023) Unsupervised fault diagnosis of wind turbine bearing via a deep residual deformable convolution network based on subdomain adaptation under time-varying speeds. Eng Appl Artif Intell 118:105656
[16]	Liu S, Shen C, Chen Z, Huang W, Zhu Z (2021a) A sudden fault detection network based on Time-sensitive gated recurrent units for bearings. Measurement 186:110214
[17]	Liu S, Chen J, He S, Xu E, Lv H, Zhou Z (2021b) Intelligent fault diagnosis under small sample size conditions via bidirectional InfoMax GAN with unsupervised representation learning. Knowl Based Syst 232:107488
[18]	Lu J, Delin Z, Yufeng Z, Large-Scale PFN (2020) Fault diagnosis Method based on Multidimensional Time Series Anomaly Detection using convolutional neural network. IEEE Trans Plasma Sci 48:3997–4005
[19]	Nath AG, Udmale S, Raghuwanshi D, Singh S (2022) Structural Rotor Fault diagnosis using attention-based Sensor Fusion and transformers. IEEE Sens J 22:707–719
[20]	Qiao W, Lu D (2015) A survey on wind turbine Condition Monitoring and Fault diagnosis—part I: components and subsystems. IEEE Trans Industr Electron 62:6536–6545
[21]	Qiao M, Yan S, Tang X, Xu C (2020) Deep Convolutional and LSTM recurrent neural networks for Rolling Bearing Fault diagnosis under strong noises and variable loads. IEEE Access 8:66257–66269
[22]	Ruan D, Wang J, Yan J, Guehmann C (2023) CNN parameter design based on fault signal analysis and its application in bearing fault diagnosis. Adv Eng Inform 55:101877
[23]	Shi J, Peng D, Peng Z, Zhang Z, Goebel K, Wu D (2022) Planetary gearbox fault diagnosis using bidirectional-convolutional LSTM networks. Mech Syst Signal Process 162:107996
[24]	Song Y, Yu J, Tang D, Yang J, Kong L, Li X (2022) Anomaly Detection in Spacecraft Telemetry Data using Graph Convolution Networks, 2022 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–6
[25]	Van Gompel J, Spina D, Develder C (2022) Satellite based fault diagnosis of photovoltaic systems using recurrent neural networks. Appl Energy 305:117874
[26]	Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN (2003a) A review of process fault detection and diagnosis: part I: quantitative model-based methods. Comput Chem Eng 27:293–311
[27]	Venkatasubramanian V, Rengaswamy R, Kavuri SN (2003b) A review of process fault detection and diagnosis: part II: qualitative models and search strategies. Comput Chem Eng 27:313–326
[28]	Venkatasubramanian V, Rengaswamy R, Kavuri SN, Yin K (2003c) A review of process fault detection and diagnosis: part III: process history based methods. Comput Chem Eng 27:327–346
[29]	Wu B, Cai W, Cheng F, Chen H (2022) Simultaneous-fault diagnosis considering time series with a deep learning transformer architecture for air handling units. Energy Build 257:111608
[30]	Xiao Y, Shao H, Min Z, Cao H, Chen X, Lin J (2022) Multiscale dilated convolutional subdomain adaptation network with attention for unsupervised fault diagnosis of rotating machinery cross operating conditions. Measurement 204:112146
[31]	Xiaozhi L, Ganggang S, Yinghua Y (2019) Fault Diagnosis of Rolling Bearing Based on Wavelet Packet Transform and GA-Elman Neural Network, 2019 Chinese Control And Decision Conference (CCDC), pp. 462–466
[32]	Xu Y, Yan X, Feng K, Sheng X, Sun B, Liu Z (2022) Attention-based multiscale denoising residual convolutional neural networks for fault diagnosis of rotating machinery. Reliab Eng Syst Saf 226:108714
[33]	Yang Z, Cen J, Liu X, Xiong J, Chen H (2022a) Research on bearing fault diagnosis method based on transformer neural network. Meas Sci Technol, 33
[34]	Yang J, Chang B, Zhang Y, Luo W, Ge S, Wu M (2022b) CNN coal and rock recognition method based on hyperspectral data. Int J Coal Sci Technol 9(1):63. https://doi.org/10.1007/s40789-022-00516-x
[35]	Yiakopoulos CT, Gryllias KC, Antoniadis IA (2011) Rolling element bearing fault detection in industrial environments based on a K-means clustering approach. Expert Syst Appl 38:2888–2911
[36]	Zhang Y, Zhou T, Huang X, Cao L, Zhou Q (2021) Fault diagnosis of rotating machinery based on recurrent neural networks. Measurement 171:108774
[37]	Zheng S, Zhao J (2020) A new unsupervised data mining method based on the stacked autoencoder for chemical process fault diagnosis. Comput Chem Eng 135:106755
[38]	Zhu J, Jiang Q, Shen Y, Qian C, Xu F, Zhu Q (2022a) Application of recurrent neural network to mechanical fault diagnosis: a review. J Mech Sci Technol 36:1–16
[39]	Zhu Y, Liang X, Wang T, Xie J, Yang J (2022b) Depth prototype clustering method based on unsupervised field alignment for Bearing Fault Identification of Mechanical Equipment. IEEE Trans Instrum Meas 71:1–14

About this article

Cite this article

Song, Y., Wang, W., Wu, Y. et al. Unsupervised anomaly detection in shearers via autoencoder networks and multi-scale correlation matrix reconstruction.Int J Coal Sci Technol 11, 79 (2024).

https://doi.org/10.1007/s40789-024-00730-9

Received

30 May 2023
Revised

19 August 2023
Accepted

05 August 2024
Issue Date

November -0001
DOI

https://doi.org/10.1007/s40789-024-00730-9
Share this article

Copy to clipboard

About issue

Editors-in-Chief

Managing Editor

Associate Editors

Publishing model

Unsupervised anomaly detection in shearers via autoencoder networks and multi-scale correlation matrix reconstruction

Abstract

1.Introduction

2.Related work

3.Method

3.1 Problem statement

3.2 Construction method of MSCM

3.3 C-CLA network

4.Experiments

4.1 Datasets description and configuration

4.2 Metrics and implementation details

5.Experimental result and analysis

5.1 Ablation studies of our method

5.2 Comparison with other models

5.3 Performance on other datasets

6.Conclusions

References

About this article

Cite this article

Share this article

Keywords

For Authors

Explore