Coal burst spatio-temporal prediction method based on bidirectional long short-term memory network

Xu Yang; Yapeng Liu; Anye Cao; Yaoqi Liu; Changbin Wang; Weiwei Zhao; Qiang Niu

doi:10.1007/s40789-025-00759-4

Coal burst spatio-temporal prediction method based on bidirectional long short-term memory network

Download PDF

Research Article

Open Access

Published: 10 February 2025

0 Accesses

International Journal of Coal Science & Technology Volume 12, article number 11, (2025)

Xu Yang ^1,2 ,
Yapeng Liu ^1,2 ,
Anye Cao ³ ,
Yaoqi Liu ³ ,
Changbin Wang ⁴ ,
Weiwei Zhao ³ ,
Qiang Niu ^1,2

1.

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
2.

Engineering Research Center of Mine Digitization of Ministry of Education, China University of Mining and Technology, Xuzhou, China
3.

School of Mines, China University of Mining and Technology, Xuzhou, China
4.

State Key Laboratory of Coal Resources and Safe Mining, China University of Mining and Technology, Xuzhou, China

Abstract

The increasingly severe state of coal burst disaster has emerged as a critical factor constraining coal mine safety production, and it has become a challenging task to enhance the accuracy of coal burst disaster prediction. To address the issue of insufficient exploration of the spatio-temporal characteristic of microseismic data and the challenging selection of the optimal time window size in spatio-temporal prediction, this paper integrates deep learning methods and theory to propose a novel coal burst spatio-temporal prediction method based on Bidirectional Long Short-Term Memory (Bi-LSTM) network. The method involves three main modules, including microseismic spatio-temporal characteristic indicators construction, temporal prediction model, and spatial prediction model. To validate the effectiveness of the proposed method, engineering application tests are conducted at a high-risk working face in the Ordos mining area of Inner Mongolia, focusing on 13 high-energy microseismic events with energy levels greater than 10⁵ J. In terms of temporal prediction, the analysis indicates that the temporal prediction results consist of 10 strong predictions and 3 medium predictions, and there is no false alarm detected throughout the entire testing period. Moreover, compared to the traditional threshold-based coal burst temporal prediction method, the accuracy of the proposed method is increased by 38.5%. In terms of spatial prediction, the distribution of spatial prediction results for high-energy events comprises 6 strong hazard predictions, 3 medium hazard predictions, and 4 weak hazard predictions.

1.Introduction

Coal burst is one of the common and significant dynamic hazards in the coal mining process (Bai et al. 2022; Zhao and Jiang 2010). It not only results in increased mining costs and reduced mine production efficiency but also influences the safety and lives of employees (He et al. 2017; He et al. 2018a, b; Zhao et al. 2023). Consequently, the monitoring and early warning of coal burst has become a research hotspot in the field of coal mine safety (Zhang et al. 2017; Wang et al. 2024).

The microseismic monitoring method is currently one of the most widely used early warning methods for coal burst (Dou et al. 2018; Maxwell et al. 2010; Li et al. 2023; Xu et al. 2023; Liu et al. 2024a, b; Liu et al. 2024a, b). The occurrence of a coal burst is frequently accompanied by a series of microseismic events, which contain precursory information preceding a coal burst. The microseismic monitoring method accurately determines the time and location of coal burst by analyzing this information, thereby effectively achieving the predictions of coal burst (Ma et al. 2018; Zhang et al. 2021; Lu et al. 2005; Soleimani et al. 2023; Xu et al. 2024). Predicting the risk of coal burst using microseismic monitoring data is currently a prominent research area, with related studies focusing on temporal and spatial prediction. Temporal prediction of coal burst (Pu et al. 2018, 2019; Wu et al. 2018) refers to predicting the occurrence probability of coal burst from the temporal dimension. Chen et al. (2022) integrated the deep learning model MSNet to build an intelligent early warning platform for coal burst to dynamically display predicted microseismic events. Cao et al. (2022) explained the shortcomings of using only physical indicators to predict coal burst, and introduced a temporal prediction method for coal burst that combines physical indicators and data features to predict high-energy events with different levels of impact risk. Malkowski et al. (2021) utilized the changing trend of the hazard autoregressive coefficient and parameter b value to predict the probability of high-energy microseismic occurrences. Wang et al. (2022) utilized various statistical methods to analyze the spatio-temporal correlation of energy in microseismic events. Spatial prediction of coal burst (Zhang et al. 2021; Chen et al. 2023; Cai et al. 2018) refers to predicting the probability of coal burst in different areas from the spatial dimension. Wang et al. (2021) introduced an enhanced mining earthquake clustering method, which leverages the relationship that a high Number of Possible Clustered Events (NPCE) values exhibit a robust correlation with coal burst occurrences. Liu et al. (2023) conducted an investigation that utilized theoretical analysis, numerical simulations, and other methods to develop a coal burst hazard prediction method based on earthquake focal mechanisms and positioning error calibration. Cai et al. (2016) proposed a reliable comprehensive evaluation model for the coal burst liability based on the Principal Component Analysis and Fuzzy Comprehensive Evaluation (PCA–FCE) theory and experimental data. Lurka (2021) used a hierarchical clustering method to evaluate the spatial risk potential of microseismic events. Existing efforts have yielded considerable results in the field of coal burst prediction. However, there are still some issues that need to be addressed. In the context of temporal and spatial prediction, separate selections of different time windows hinder the integration of prediction results from both temporal and spatial dimensions. Furthermore, most existing microseismic indicators for coal burst prediction are predominantly constructed based on either a single temporal or spatial dimension, resulting in limited accuracy in predictions and posing challenges in meeting the requirements of practical engineering applications. (He et al. 2019; Majid et al 2023).

To address the above issues, this paper proposes a novel spatio-temporal prediction method for coal burst based on Bidirectional Long Short-Term Memory (Bi-LSTM) network. This method includes three main modules: microseismic spatio-temporal characteristic indicators construction, temporal prediction model, and spatial prediction model. In the first module, with the consideration of both temporal and spatial aspects of the microseismic data, we construct microseismic spatio-temporal characteristic indicators that reflect the spatio-temporal aggregation features, which provides the foundation for the temporal and spatial predictions. In the second module, we propose a novel temporal prediction model based on Bi-LSTM for coal burst, which incorporates the storage and updating mechanisms for both long-term and short-term information, enabling the effective capture of dependencies in long sequences, thereby achieving accurate temporal prediction for coal burst. In the third module, based on the working principle of Bi-LSTM, we propose a coal burst spatial prediction model by integrating both long-time and short-time windows. This approach analyzes both the long-term global features in long-time windows and the short-term local features in short-time windows, facilitating spatial prediction for coal burst. The temporal and spatial prediction models presented in this study can collaboratively provide spatio-temporal prediction for coal burst.

2.Overview of method

Figure 1 is the structure of the coal burst spatio-temporal prediction method based on Bi-LSTM, which mainly includes three modules: microseismic spatio-temporal characteristic indicators construction, temporal prediction model, and spatial prediction model.

Microseismic spatio-temporal characteristic indicators construction takes into account the spatio-temporal correlation within microseismic data, which involves creating a temporal dataset by merging microseismic data. The Principal Component Analysis (PCA) method is utilized to condense the initial 4-dimensional dataset into a 2-dimensional representation, effectively reducing dimensionality while preserving the essential information within the dataset. Subsequently, the Kernel Density Estimation (KDE) method is applied to the dimensionally reduced 2-dimensional data, producing probability density values for each data point. These probability density values are defined as microseismic spatio-temporal characteristic indicators, offering insight into the extent of data aggregation during the given time period. These indicators hold significant relevance for subsequent temporal and spatial prediction applications.

Temporal prediction entails the creation of the precursor pattern derived from the processed time series dataset of microseismic data. In order to predict future coal burst risk levels, a deep learning model utilizing Bi-LSTM is developed. Specifically, the time series dataset is subjected to statistical analysis to extract key metrics such as maximum energy, average energy, and frequency of microseismic events within the time series. A new time series dataset is then generated using the microseismic spatio-temporal characteristic indicators, from which the precursor pattern sequence is constructed. The Bi-LSTM is utilized to extract features from this precursor pattern sequence, obtaining regression loss and data features. The backpropagation method is adopted to continuously update the parameters in the neural network model to minimize the loss of the model on the training data set. Finally, a fully connected network is utilized to achieve precise predictions in the coal burst temporal prediction model.

Spatial prediction involves creating separate spatial cloud maps for risk areas in both long-time and short-time windows and subsequently generating a cloud map of the hazardous region through weight superposition to identify the hazardous area. In detail, the process involves the creation of data for both long-time and short-time windows based on the working principle of Bi-LSTM. The microseismic spatio-temporal characteristic indicators within these time windows are normalized, and the corresponding weights are calculated. The kriging interpolation method is then applied to process the gridded data from both long-time and short-time windows. By using weighted overlap, the fused time window data is utilized to construct a cloud map of the risk area. The hazardous area is determined based on this cloud map, finalizing the development of the spatial prediction method.

The temporal prediction model and spatial prediction method proposed in this article can work together. The temporal prediction model can provide a suitable window for the spatial prediction method. When the temporal prediction is dangerous, the spatial prediction method can be used to confirm the dangerous area; when the dangerous area is predicted to be in a special geological structure, the temporal prediction method can also be used in advance to determine the danger.

3.Detailed design

3.1 Spatio-temporal characteristic indicators construction

We introduce novel microseismic feature indicators based on a PCA and KDE fusion data processing method. These indicators can consider both microseismic temporal and spatial information, and reflect the spatio-temporal aggregation degree of microseismic data. The specific process is shown in Fig. 2.

First, we utilize a consistent time window to accumulate microseismic original data. Assuming that the dataset record of the i-th microseismic event is $m_{i}$, it can be expressed as:

$$m_{i} = \left[ {x_{i} ,y_{i} ,z_{i} ,t_{i} } \right]$$

(1)

where, $x_{i}$, $y_{i}$, $z_{i}$ represent the source coordinates $X$, $Y$, $Z$ within the i-th microseismic event and $t_{i}$ is the microseismic time within the i-th microseismic event. Assume that the microseismic data record in the j-th time window is $M_{j}$ and the number of microseismic data is $n$, which can be expressed as:

$$M_{j} = \left[ {m_{0} ,m_{1} ,...,m_{n - 1} } \right]$$

(2)

Therefore, when the data set is divided into $k$ time windows, the time series data set $M$ can be expressed as:

$$M = \left[ {M_{0} ,M_{1} ,...,M_{k - 1} } \right]$$

(3)

The time series dataset $M$ comprises 4-dimensional data, encompassing microseismic source coordinates $X$, $Y$, $Z$ and time $t$. In cases where an individual $M_{j}$ contains an excessive data dimensionality, the large spatial distances between each data point $m_{i}$ pose challenges for data analysis, resulting in diminished visualization effectiveness.

To address the above issues, we utilize the PCA method (Cai et al. 2016; He et al. 2018a, b) for dimensionality reduction of microseismic data. PCA is a widely used technique for dimensionality reduction and data analysis, which transforms the original high-dimensional data into a new set of lower-dimensional features through linear transformation. The objective of PCA is to identify this new set of features in a way that maximizes the variance of data in the new feature space, thereby preserving the most essential data information. PCA has the ability to reduce high-dimensional data to lower-dimensional data while preserving the primary information within the dataset. These newly derived features are known as principal components $t_{1}$ to $t_{4}$. Principal components are ordered by decreasing data variance, with the initial components capturing the majority of the data variance. This enables the representation of the original data in a reduced-dimensional space. Consequently, through this method, the microseismic event $m_{i}^{\prime}$ is reduced to 2 dimensions, resulting in the dimensionally reduced $M_{j}^{\prime}$, which can be expressed as:

$$M_{j}^{\prime} = \left[ {m_{0}^{\prime} ,m_{1}^{\prime} ,...,m_{n - 1}^{\prime} } \right]$$

(4)

Furthermore, in order to explore the correlation between $m_{i}^{\prime}$ in $M_{j}^{\prime}$, we utilize the KDE method (Huang et al. 2022a, b; Mi et al. 2022), which is a non-parametric statistical technique utilized for estimating the probability density function of unknown random variables. It achieves an estimation of the entire probability density function, also called kernel function or window function, by treating each data point as a miniature probability density function, and then aggregating all these kernel functions. KDE proves effective in revealing the distribution characteristic of data, enabling the analysis of data structure and features. The specific formula for KDE (Dehnad 1987) can be expressed as:

$$\hat{f}_{h} (x) = \frac{1}{nh}\sum\limits_{j = 1}^{n} {K\left( {\frac{{x - x_{j} }}{h}} \right)}$$

(5)

where $\hat{f}_{h} (x)$ is the probability density estimate, $n$ is the number of sample data points, $x_{j}$ is the j-th sample data point, $K( \cdot )$ is the kernel function, indicating the local probability density of each data point, $h$ is the bandwidth parameter, which controls the width of the kernel function, affecting the smoothness of the estimate.

We utilize KDE to process the $M_{j}^{\prime}$ to derive the probability density $\rho_{j}$, which corresponds to each data point within the microseismic dataset. This probability density $\rho_{j}$ serves as an indicator of the correlation between $m_{i}^{\prime}$ in $M_{j}^{\prime}$. Therefore, if early correlations among microseismic events can be detected, it can prove beneficial for the prediction of high-energy events. In this paper, the probability density $\rho_{j}$ set derived from $M_{j}^{\prime}$ is utilized to construct the set of the microseismic spatio-temporal characteristic indicators $\rho_{j}$. These indicators are subsequently utilized to analyze the application of each data point in both spatial and temporal contexts. The set of the microseismic spatio-temporal characteristic indicators $U_{j}$ in the j-th time window is specifically expressed as:

$$U_{j} = \left[ {\rho_{0} ,\rho_{1} ,...,\rho_{n - 1} } \right]$$

(6)

3.2 Temporal prediction model

3.2.1 Data preprocessing

In order to make the data suitable for model prediction and training, the original microseismic data including microseismic occurrence time, source coordinates, energy magnitude, and microseismic spatio-temporal characteristic indicators are preprocessed. Statistically analyze $M_{j}$, and the obtained data record is $s_{j}$, which can be expressed as:

$$s_{j} = \left[ {id,e_{\max } ,e_{\text{avg}} ,f,\rho_{\max } } \right]$$

(7)

where $id$ is the time window number, $e_{\max }$ is the maximum energy of microseismic events within the time window, $e_{\text{avg}}$ is the average energy of microseismic events within the time window, $f$ is the frequency of microseismic events within the time window, and $\rho_{\max }$ is the maximum value of the spatio-temporal characteristic indicators of microseismic events within the time window. When the data set is divided into $k$ time windows, the time series data set $S$ can be expressed as:

$$S = \left[ {s_{0} ,s_{1} ,...,s_{k - 1} } \right]$$

(8)

Furthermore, the label set $R$ is constructed according to the time series dataset $S$. $r_{k}$ represents the label of the corresponding time window, which can be expressed as:

$$R = \left[ {r_{0} ,r_{1} ,...,r_{k - 1} } \right]$$

(9)

Table 1 is the classification of coal burst risk corresponding to microseismic energy developed based on the study mining area.. We construct labels for the time window data according to Table 1.

Table 1 Classification of coal burst risk corresponding to microseismic energy

Label	Maximum energy	Risk level
0	$e_{\max }$ < 1 × 10³ J	No
1	1 × 10³ J ≤ $e_{\max }$ < 1 × 10⁴ J	Weak
2	1 × 10⁴ J ≤ $e_{\max }$ < 1 × 10⁵ J	Medium
3	$e_{\max }$ ≥ 1 × 10⁵ J	Strong

A precursor pattern sequence is constructed based on the time series dataset, with the e-th precursor pattern sequence represented as $W_{i}$, which can be expressed as:

$$W_{i} = \left[ {s_{i \times g} ,s_{i \times g + 1} ,...,s_{i \times g + p - 1} } \right]$$

(10)

where $g$ is the sampling step and $p$ is the length of the precursor pattern sequence. From this, the set $S_{W}$ of $q$ precursor model sequence is constructed as:

$$S_{W} = \left[ {W_{0} ,W_{1} ,...,W_{q - 1} } \right]$$

(11)

3.2.2 Temporal prediction model construction

We construct a coal burst temporal prediction model based on the Bi-LSTM, an extension of LSTM, which utilizes a model composed of two one-way LSTMs. LSTM (Huang et al. 2022a, b; Shu et al. 2022) excels in processing time series data through its gating mechanism, selectively incorporating or retaining information with forget, input, and output gates. These gates establish connections in microseismic event sequences, facilitating input and output processes. In a trained network, the weight of the forget gate tends to be 1 when unimportant information is absent, enabling long-term memory, while the weight of the input gate nears 1 for significant information, effectively capturing long-term dependencies and overcoming the vanishing gradient problem in traditional recurrent neural networks. Bi-LSTM processes data in both directions, allowing the model to consider contextual information simultaneously for more accurate predictions. Consequently, when using Bi-LSTM, the model can comprehensively analyze past and future time period data, facilitating thorough data information learning during the training process. For these reasons, Bi-LSTM is selected as a key component of our model construction.

Figure 3 is the structure of the temporal prediction model. The set of precursor pattern sequence $S_{W}$ mentioned above is input into the Bi-LSTM for training, enabling the extraction of implicit features. The model takes the i-th precursor pattern sequence $W_{i}$ of size 12 × 4 as input, with a batch size of $N_{\text{B}}$. It undergoes processing through 1 Bi-LSTM layer and 4 fully connected layers, and ultimately, the Softmax function is applied to produce an output vector of size 4 × 1. The Bi-LSTM layer has a cell size of 100, calculating the data both in the forward and backward directions. The results from the 2 directions, hidden state $h_{N - 1}^{L}$ and hidden state $h_{N - 1}^{R}$, are then concatenated to produce an output with a dimensionality of 200. This study employs the cross-entropy loss function to calculate the classification loss. The backpropagation method is utilized to iteratively update the parameters within the neural network model, reducing the loss of the model on the training dataset. This process completes the construction of the neural network model and facilitates the prediction of future microseismic event levels.

In the temporal prediction model, the fully connected layer maps the learned distributed feature representation into the sample label space, which can play a role in classification. Simultaneously, the activation function is utilized for normalization, generating the probabilities for each coal burst risk level. The class with the largest probability serves as the output of the model, and this is subsequently converted into one of the 4 risk levels: strong, medium, weak, or no for coal burst. Thus, the prediction of the coal burst level is accomplished, where the activation function can be expressed as:

$${\text{Softmax}} = \frac{{e^{{x_{k} }} }}{{\sum\nolimits_{k = 0}^{3} {e^{{x_{k} }} } }}$$

(12)

where $x_{k}$ is the output probability of the k-th class, and the output value for the multi-classification task can be transformed into the [0,1] range using the softmax function.

3.3 Spatial prediction model

Microseismic spatio-temporal characteristic indicators provide a spatial representation of the correlation among data points within the microseismic dataset and the degree of aggregation among them. When a high-energy event is situated in a region with a large probability, it indicates a substantial correlation between the high-energy event and each $m_{i}^{\prime}$ within $M_{j}^{\prime}$.

In order to make full use of the influence of both long-time and short-time microseismic data on spatial prediction, we introduce a spatial prediction method for coal burst based on the working principle of Bi-LSTM, which integrates long-time and short-time windows. The central concept of this method involves using microseismic spatio-temporal characteristic indicators to create a spatial prediction cloud map for short-time windows $T_{1}$, with settings aligning with the optimal time window selection from the aforementioned temporal prediction model. Subsequently, a spatial prediction cloud map for long-time windows $T_{2}$ is generated. Finally, a weighted method is applied to merge the spatial prediction cloud map from the long-time and short-time windows. The specific calculation process of this method is as follows.

Assume that $E_{1}$ and $E_{2}$ are the grid matrices of the microseismic spatio-temporal characteristic indicators in the short-time window $T_{1}$ and the long-time window $T_{2}$, respectively, and $E_{3}$ is the grid matrix of the microseismic spatio-temporal characteristic indicators fused in the long-time and short-time windows.

$$E_{3} = W_{{T_{1} }} \times E_{1} + W_{{T_{2} }} \times E_{2}$$

(13)

We perform grid mapping of the working surface coordinates, filling in the normalized microseismic spatio-temporal characteristic indicators for both the short-time window $T_{1}$ and long-term window $T_{2}$. Subsequently, the Kriging interpolation method is utilized to complete the gridding process, resulting in the grid matrices $E_{1}$ and $E_{2}$. Figure 4 are spatial prediction cloud maps, which show the cloud map for the hazard area. It can be observed that in the short-time window, the spatial range is large and cannot effectively reflect the real hazard area in Fig. 4a. In contrast, cloud maps of hazard areas in the long-time windows are difficult to effectively reveal short-time microseismic characteristic in Fig. 4b. To address this, we utilize a weighted superposition approach, merging matrices $E_{1}$ and $E_{2}$ to obtain a microseismic spatio-temporal characteristic indicators grid matrix $E_{3}$, which amalgamates data from both the long-time and short-time windows. Based on this grid matrix, a spatial cloud map is established and the hazardous area is determined using this cloud map in Fig. 4c.

The weight determination process is as follows. First, we select the microseismic data for both the $T_{1}$ and $T_{2}$ time windows, and the microseismic spatio-temporal characteristic indicators are calculated. These indicators are then normalized, and the variances of the respective characteristic indicators are computed. The greater the data variance, the more dispersed the data is and the greater the amount of information contained in it. It is assumed that indicators with a larger variance carry more information and are consequently assigned a greater weight (Cai et al. 2020). Additionally, the data volume in each unit time window is tallied. The occurrence of large-energy events is often accompanied by the occurrence of a large number of microseismic events. At the same time, the greater the amount of data, the more information it contains. It is assumed that indicators with a higher data volume within the unit time window contain more information and are assigned a proportionally greater weight. Taking into account the 2 weight considerations, where the weight values for the $T_{1}$ and $T_{2}$ time window data are determined separately can be expressed as:

$${\text{W}}_{{T_{1} }} = \frac{{\frac{{\sigma_{{T_{1} }} }}{{\sigma_{{T_{1} }} + \sigma_{{T_{2} }} }} + \frac{{L_{{T_{1} }} }}{{L_{{T_{1} }} + L_{{T_{2} }} }}}}{2}$$

(14)

$${\text{W}}_{{T_{2} }} = \frac{{\frac{{\sigma_{{T_{2} }} }}{{\sigma_{{T_{1} }} + \sigma_{{T_{2} }} }} + \frac{{L_{{T_{2} }} }}{{L_{{T_{1} }} + L_{{T_{2} }} }}}}{2}$$

(15)

where $\sigma_{{T_{1} }}$ and $\sigma_{{T_{2} }}$ are the variances of the time window data for $T_{1}$ and $T_{2}$ respectively, $L_{{T_{1} }}$ and $L_{{T_{2} }}$ are the number of microseismic events in the unit time window for the $T_{1}$ and $T_{2}$ time window data respectively.

Table 2 is a spatial hazard area division criterion based on microseismic spatio-temporal characteristic indicators. We use this division criterion to classify the level of hazard in spatial areas.

Table 2 A spatial hazardous area division criterion based on microseismic spatio-temporal characteristic indicators

Microseismic spatio-temporal characteristic indictaors	Risk level
0 ≤ $\rho$ < 0.2	No
0.2 ≤ $\rho$ < 0.45	Weak
0.45 ≤ $\rho$ < 0.7	Medium
0.7 ≤ $\rho$ ≤ 1	Strong

4.Method test and engineering application

4.1 Site characteristics

To validate the efficacy of the Bi-LSTM-based coal burst spatio-temporal prediction method proposed in this paper for both temporal and spatial prediction methods, extensive experiments are conducted using microseismic data from the LW2215 of Yingpanhao Coal Mine in Ordos, Inner Mongolia Autonomous Region, China. The mine utilizes a large lane strip layout, with the 2–2 coal seam mined at an average burial depth of 731.4 m. The inclination angle of the coal seam falls within the range of 0° to 4°, averaging at 2° and the thickness of the coal seam varies from 5.64 m to 7.33 m, with an average thickness of 6.41 m. Figure 5 is LW2215 layout and geophone distribution. Fully-mechanized coal mining technology is adopted, utilizing retreating mining and the roof caving for roof management. The long wall sections on the north and south sides are designated as LW2217 and LW2213 respectively.

The microseismic monitoring system “SOS”, developed by the Central Mining Research Institute of Poland and installed underground in Yingpanhao Coal Mine, provides continuous monitoring of mine activities. This system is equipped with real-time monitoring recorders, digital transmission systems, analyzers, and sensors. The sensors utilized are single vertical component detectors with a frequency range of 1–600 Hz, a sampling rate of 500 Hz, a maximum data transmission rate of 1 Mb/s, and A/D conversion with 16-bit precision. This system allows for accurate determination of the time of microtremor occurrence, the released seismic energy, and the 3-dimensional coordinates. Microseismic events are located using the Powell positioning algorithm with a uniform velocity model. In the mine, there are 16 sensors, with 7 of them positioned near LW2215 in Fig. 5. The primary data source for this article comprises 9969 pieces of microseismic data collected from January to April.

4.2 Evaluation of temporal prediction method

4.2.1 Temporal prediction model training

We utilize a total of 9969 microseismic events from the LW2215 in the months of January to May 2022 as the dataset for this experiment. During the training process, the data set is divided into a ratio of 8:2 to generate a training set and a test set. The microseismic data from January to May is organized into 6-h time windows using a sliding window approach also spanning 6 h. Analysis of the data in these time windows revealed a total of 27 high-energy events. Notably, during the period from March to April, there is a frequent occurrence of high-energy events with an energy level exceeding 10⁵ J, demonstrating that the data within this time range is suitable for model training. Table 3 is the parameters of data preprocessing. During the model training process, the precursor pattern sequence $W_{i}$ is utilized as the input. The training parameters are specified as follows: 200 training epochs, a batch size of 128, a decay rate of 0.1, and a learning rate of 0.001. The loss function utilized is the cross-entropy loss function, and the evaluation metrics include accuracy and F1 value. The softmax activation function is applied for normalization to derive the probability of each level of microseismic events, thus providing the predicted level of future microseismic events, where the cross-entropy loss function can be expressed as (Duan et al. 2021):

$${\mathcal{L}}_{\text{CE}} = - \sum\limits_{j = 0}^{C - 1} {y_{j} \log \left( {p_{j} } \right)}$$

(16)

where $p_{j}$ is the predicted probability that the sample belongs to the j-th class, $y_{j}$ is the true label for the j-th class, and $C$ is the number of classes.

Table 3 The parameters of data preprocessing

Parameter	The meaning of parameter
$I$	Time window (hour)
$P$	Precursor pattern sequence length
$s$	Precursor pattern sequence sampling step size
$N$	Prediction time (hour)

4.2.2 Evaluation indicators

Table 4 is the confusion matrix, which is utilized to record the prediction results of the model. True positive represents if the prediction result is true, and the actual result is also true, which is defined as TP. False positive represents if the prediction result is true, but the actual result is false, which is defined as FP. False negative represents if the prediction result is false, but the actual result is true, which is also defined as FN. True negative represents if the prediction result is false, and the actual result is also false, which is also defined as TN. We utilize 2 evaluation indicators, accuracy and F1 value, to evaluate the model. Accuracy is the percentage of prediction results in the total samples, which is defined as follows (Cheng et al. 2023):

$$\text{Accuracy} = \frac{\text{TP} + \text{FN}}{{\text{TP} + \text{FN} + \text{FP} + \text{FN}}}$$

(17)

Table 4 The confusion matrix

Predicted	Actual
Predicted	True	False
True	TP	FP
False	FN	TN

F1 value can quantify the overall performance of the model, which contains precision and recall. Precision is for predicted samples, indicating the proportion of actual positive samples that are correctly predicted as positive samples. Recall is for original samples, indicating the proportion of actually positive samples that are correctly predicted as positive samples. The calculation formula is as follows (Cheng et al. 2023):

$$F_{1} = 2 \times \frac{{{\text{Precision}} \times {\text{Recall}}}}{{{\text{Precision}} + {\text{Recall}}}}$$

(18)

where,

$${\text{Precision}} = \frac{{\text{TP}}}{{{\text{TP}} + {\text{FP}}}}$$

(19)

$${\text{Recall}} = \frac{{\text{TP}}}{{{\text{TP}} + {\text{FN}}}}$$

(20)

4.2.3 The impact of different indicators

4.2.3.1 Determine the optimal prediction time $N$

To assess the effectiveness of the temporal prediction method based on Bi-LSTM proposed in this study, the prediction time N of the prediction model is 6 h, 12 h, and 18 h. When utilizing 12 sets of 6-h data to predict the following 6 h, 12 h, and 18 h, the sample distribution in the training and test sets followed a ratio of 468:117. Table 5 shows the data preprocessing parameters with different prediction times $N$.

Table 5 The data preprocessing parameters with different prediction times $N$

Parameter	The meaning of parameter	Value
$I$	Time window (hour)	6
$P$	Precursor pattern sequence length	12
$s$	Precursor pattern sequence sampling step size	1
$N$	Prediction time (hour)	6/12/18

Tables 6, 7 and 8 are the confusion matrix for the model results using different prediction times $N$, which are used to assess the accuracy and F1 value of the prediction results. For different prediction times $N$ of 6 h, 12 h, and 18 h, the corresponding accuracy values are 0.872, 0.838, and 0.803, respectively, and the corresponding F1 values are 0.892, 0.787, and 0.669, respectively. Figure 6 is the prediction result of different prediction times $N$, which is evident that the ability of this model to predict coal burst varies with the size of prediction time $N$. The temporal prediction model performs best when the prediction time is set to 6 h, with the accuracy of 0.872 and the F1 value of 0.892. It is higher than the 2 sets of prediction parameter settings. Figure 7 is error distributions of different prediction times $N$, which is to record the prediction errors to show the variance between the actual risk level and the predicted level. If the actual risk level is strong and the predicted level is medium, the error is 1. The error distributions are limited to 1 level in all prediction time N in Fig. 7. In summary, the prediction results show that the model has the best prediction performance when prediction time $N$ is set to 6 h.

Table 6 The confusion matrix for the model results using the 6-h prediction time

Predicted	Actual
Predicted	No	Weak	Medium	Strong
No	12	6	0	0
Weak	1	55	5	0
Medium	0	3	33	0
Strong	0	0	0	2
Accuracy = 0.872, F1 = 0.892

Table 7 The confusion matrix for the model results using the 12-h prediction time

Predicted	Actual
Predicted	No	Weak	Medium	Strong
No	0	0	0	0
Weak	5	44	6	0
Medium	0	6	50	0
Strong	0	0	0	4
Accuracy = 0.838, F1 = 0.787

Table 8 The confusion matrix for the model results using the 18-h prediction time

Predicted	Actual
Predicted	No	Weak	Medium	Strong
No	0	0	0	0
Weak	3	30	6	0
Medium	0	9	60	2
Strong	0	0	3	4
Accuracy = 0.803, F1 = 0.669

4.2.3.2 Determine the optimal prediction time window $I$

In order to assess the influence of the time window $I$ on the model, we use the time window $I$ of 6 h, 12 h, and 18 h when prediction time $N$ is set to 6 h. The ratios of samples in the training and test sets are 468:117, 467:117, and 466:117, respectively. Table 9 is the data preprocessing parameters with different time windows $I$.

Table 9 The data preprocessing parameters with different time windows $I$

Parameter	The meaning of parameter	Value
$I$	Time window (hour)	6/12/18
$P$	Precursor pattern sequence length	12
$s$	Precursor pattern sequence sampling step size	1
$N$	Prediction time (hour)	6

Tables 10 and 11 are the confusion matrices for the model results of different time window $I$ of 12 h and 18 h, which is used to assess the accuracy and F1 value of the prediction results. For different time windows $I$ of 12 h and 18 h, the accuracy values are 0.803 and 0.735, respectively, and the F1 values are 0.829, and 0.785, respectively. The time window $I$ of 6 h are 0.872 and 0.892 in accuracy and F1 value, respectively in Table 6. Figure 8 is the prediction result of different time windows $I$, which is evident that the ability of this model to predict coal burst varies with the size of the time window $I$. The temporal prediction model performs best when the time window is set to 6 h, with the accuracy of 0.872 and the F1 value of 0.892. It is higher than the 2 sets of prediction parameter settings. Figure 9 are error distributions of different time windows $I$ of 12 h and 18 h, which is to record the prediction errors to show the variance between the actual risk level and the predicted level. The error distribution is limited to 1 level when the time window $I$ is set to 12 h. However, when the prediction time window $I$ is set to 18 h, there are 2 errors reaching level 2 in Fig. 9. The error distribution is limited to 1 level when the time window $I$ is set to 6 h in Fig. 7a. Therefore, the time window $I$ of 6 h and 12 h can control the level range effectively. In summary, the prediction results show that the model has the best prediction performance when the time window $I$ is set to 6 h. Furthermore, when the time window $I$ is set to 6 h and the prediction time $N$ is set to 6 h, our model has the best performance.

Table 10 The confusion matrix for the model results using the 12-h time window

Predicted	Actual
Predicted	No	Weak	Medium	Strong
No	10	8	0	0
Weak	3	51	7	0
Medium	0	5	31	0
Strong	0	0	0	2
Accuracy = 0.803, F1 = 0.829

Table 11 The confusion matrix for the model results using the 18-h time window

Predicted	Actual
Predicted	No	Weak	Medium	Strong
No	10	9	0	0
Weak	3	47	11	0
Medium	0	8	27	0
Strong	0	0	0	2
Accuracy = 0.735, F1 = 0.785

4.2.3.3 Evaluate the influence of different indicators

In order to evaluate the influence of different indicators on the temporal prediction model, we carry out comparative experiments on different indicators. The baseline is the time window $I$ is set to 6 h and the prediction time $N$ is set to 6 h. Table 12 is the confusion matrix for the result of the temporal model without microseismic spatio-temporal characteristic indicators. The accuracy and F1 value of the model decreased by 2.6% and 2.3%, respectively in Fig. 9 Therefore, the integration of microseismic spatio-temporal characteristic indicators leads to an enhancement in prediction accuracy.

Table 12 The confusion matrix for the result of the temporal model without microseismic spatio-temporal characteristic indicators

Predicted	Actual
Predicted	No	Weak	Medium	Strong
No	11	5	0	0
Weak	2	55	7	0
Medium	0	4	31	0
Strong	0	0	0	2
Accuracy = 0.846, F1 = 0.869

Table 13 is the confusion matrix for the result of the temporal model with microseismic spatio-temporal characteristic indicators only, which is to assess the influence of maximum energy, average energy, and frequency indicators on the temporal prediction model. The accuracy and F1 value of the model decreased by 29.9% and 65.5%, respectively in Table 13. Figure 10 is errors distribution of microseismic spatio-temporal characteristic indicators are specified, which is evident that the model failed in predicting high-energy events in both cases. There are 7 errors reaching level 2. Therefore, the fusion of these indicators, which contain maximum energy, average energy, and frequency, can better improve the prediction accuracy of the model. In summary, the temporal prediction method and indicators selection proposed in this paper can improve the accuracy of predicting future high-energy events.

Table 13 The confusion matrix for the result of the temporal model with microseismic spatio-temporal characteristic indicators only

Predicted	Actual
Predicted	No	Weak	Medium	Strong
No	6	2	0	0
Weak	7	58	35	2
Medium	0	4	3	0
Strong	0	0	0	0
Accuracy = 0.573, F1 = 0.237

4.2.4 Temporal prediction in engineering application

In order to further evaluate the model capabilities, we conduct engineering application practice. Following Sect. 4.2.3, the time window $I$ is set to 6 h and prediction time $N$ is set to 6 h, with a learning rate of 0.001. The model settings remain unchanged. Given that there are 13 high-energy events in March on the LW2215, a total of 236 sets of time period data from January to February are used to predict the energy levels in March. The process involves training the initial model with the first 236 sets of data from the time periods and then predicting the energy event levels for the 237-th set of time period data. The time window is then moved forward by a period of 6 h, and the model is updated. This pattern continues for a total of 122 simulation trainings.

Table 14 is the confusion matrix for the result of engineering applications. The accuracy and F1 values are 0.811 and 0.776, respectively. For high-energy events, the model successfully predicts 10 of them. While the other 3 high-energy events with a strong risk level are predicted to medium risk level, resulting in the accuracy of 0.770. It is obviously that there are no false alarms for high-energy events, which means the false alarm rate is 0.

Table 14 The confusion matrix for the result of engineering applications

Predicted	Actual
Predicted	No	Weak	Medium	Strong
No	6	5	0	0
Weak	7	48	4	0
Medium	0	4	35	3
Strong	0	0	0	10
Accuracy = 0.811, F1 = 0.776

Figure 11 is the error distribution of engineering application. Out of the 122 microseismic events, 99 energy events are successfully predicted, and the prediction results for the remaining 23 energy events have errors within 1 level. Engineering application shows that the temporal prediction model proposed in this paper can be effectively applied in actual engineering applications.

Additionally, to compare the prediction performance of the microseismic spatio-temporal characteristic indicators in the temporal prediction model with the traditional threshold-based coal burst prediction method (Si et al. 2020), in which the threshold method is applied to analyze the microseismic spatio-temporal characteristic indicators. The specific result is presented in Fig. 12.

In Fig. 12, the purple dots represent time periods without high-energy events, and the green pentagrams represent time periods with high-energy events. A time window of 6 h is selected, and a threshold value of 70% between the maximum and minimum values of the microseismic spatio-temporal characteristic indicators for the month is used as the threshold. When the peak value exceeds 70%, it suggests that high-energy events may occur in the next 3 days (12 time periods). It can be observed that, using the threshold method, which successfully predicts 5 out of the 13 high-energy events in March (i.e., data between the two lines of the same color in Fig. 12), resulting in the accuracy of only 0.385. Comparing the prediction results using the threshold method with the temporal prediction method proposed in this paper, it is evident that the method proposed in the paper is 38.5% more effective. It can predict high-energy events in advance, allowing for proactive coal burst prevention and control measures to reduce potential losses in engineering applications.

4.3 Evaluation of spatial prediction method

The spatial prediction method for this study selected 13 high-energy events that occurred in March. The prediction time $N$ and time window $I$ are both set to 6 h, maintaining consistency with the temporal prediction. This alignment enables the application of spatial prediction in engineering. For the statistical analysis of the 13 high-energy events, parameters $T_{1}$ and $T_{2}$ are set to 6 h and 3 days, respectively. The resulting spatial cloud maps are obtained by overlaying the data from these 2 sets of grids.

Figure 13 is the result of spatio-temporal prediction, which illustrates the spatio-temporal prediction cloud maps for 3 of the high-energy events, with white pentagrams indicating the respective locations of these events. It can directly reflect the temporal dimension and spatial hazard area division of microseismic events prediction within the time window. Based on the spatio-temporal prediction results, we can intuitively observe the specific spatio-temporal characteristic of microseismic events during the time period.

Figure 14 is the prediction results of high-energy events. Out of the 13 high-energy event predictions, 6 are located in strong hazard, 3 are located in medium hazard and 4 are located in weak hazard. This indicates that the spatial prediction method proposed in this paper is effective and can meet the requirements of engineering applications.

In summary, this paper integrates a temporal prediction model with a fusion spatial prediction method to construct a spatio-temporal prediction method for coal burst, which combines long-time and shorttime windows. It allows for the early prediction of high-energy events and provides valuable insights for implementing disaster prevention measures in mining areas.

5.Concusions

This paper proposes a spatio-temporal prediction method for coal burst based on Bi-LSTM, which includes the temporal prediction method and the spatial prediction method. The temporal prediction method is based on Bi-LSTM, and it mainly includes data preprocessing, prediction model construction, model training, and model testing. Through analysis and experiments, it is found that the most effective configuration is to use a 6-h time window to predict 6 h. This paper also designs a spatial prediction method by combining long-time and short-time windows, which superimposes the data weights of long-time and short-time windows to realize the spatial risk prediction of coal burst. At last, this paper selects LW2215 working face data to verify the effectiveness of the proposed methods. Results demonstrate that in terms of temporal prediction, the accuracy reaches 0.811, the F1 value reaches 0.776, and the rate of high-energy false alarms is 0. The prediction results for 13 high-energy events are 10 strong hazards and 3 medium hazards. Moreover, compared with the traditional threshold-based coal burst prediction method, the accuracy rate is increased by 38.5%. In terms of spatial prediction, results show that the distribution areas corresponding to high-energy events are 6 strong hazards, 3 moderate hazards, and 4 weak hazards. It is worth noting that the proposed spatio-temporal prediction method for coal burst based on Bi-LSTM has so far been tested and applied in a single mining area. How to transfer the coal burst prediction model to other mining areas is an important research direction in the future. We will optimize the model method in more mining areas to find a universal model. Simultaneously, the existing spatial prediction of the hazardous area range may exhibit some fluctuations. The method is scheduled for refinement in future research.

References

[1]	Bai XX, Cao AY, Cai W, Wen YY, Liu YQ, Wang SW, Li XW (2022) Rock burst mechanism induced by stress anomaly in roof thickness variation zone: a case study. Geomat Nat Haz Risk 13(1):1805–1830
[2]	Cai W, Dou LM, Si GY, Cao AY, He J, Liu S (2016) A principal component analysis/fuzzy comprehensive evaluation model for coal burst liability assessment. Int J Rock Mech Min Sci 81:62–69
[3]	Cai W, Dou LM, Zhang M, Cao WZ, Shi JQ, Feng LF (2018) A fuzzy comprehensive evaluation methodology for rock burst forecasting using microseismic monitoring. Tunn Undergr Space Technol 80:232–245
[4]	Cai W, Bai XX, Si GY, Cao WZ, Gong SY, Dou LM (2020) A monitoring investigation into rock burst mechanism based on the coupled theory of static and dynamic stresses. Rock Mech Rock Eng 53:5451–5471
[5]	Cao AY, Liu YQ, Yang X, Li S, Liu YP (2022) FDNet: knowledge and data fusion-driven deep neural network for coal burst prediction. Sensors 22(8):3088
[6]	Chen J, Zhu C, Du JS, Pu YY, Pan PZ, Bai JB, Qi QX (2022) A quantitative pre-warning for coal burst hazardous zones in a deep coal mine based on the spatio-temporal forecast of microseismic events. Process Saf Environ Prot 159:1105–1112
[7]	Chen F, Liang ZZ, Cao AY (2023) ConvLSTM for predicting short-term spatiotemporal distribution of seismic risk induced by large-scale coal mining. Nat Resour Res 32(3):1459–1479
[8]	Cheng XG, Qiao W, He H (2023) Study on deep learning methods for coal burst risk prediction based on mining-induced seismicity quantification. Geomech Geophys Geo-Energy GeoResour 9(1):145
[9]	Dehnad K (1987) Density estimation for statistics and data analysis. Chapman and Hall, London
[10]	Dou LM, Cai W, Cao AY, Guo W (2018) Comprehensive early warning of rock burst utilzing microseismic multi-parameter indices. Int J Min Sci Technol 28(5):767–774
[11]	Duan Y, Shen YR, Canbulat I, Luo X, Si GY (2021) Classification of clustered microseismic events in a coal mine using machine learning. J Rock Mech Geotech Eng 13(6):1256–1273
[12]	He JH, Dou LM, Gong S, Li J, Ma ZQ (2017) Rock burst assessment and prediction by dynamic and static stress analysis based on micro-seismic monitoring. Int J Rock Mech Min Sci 93:46–53
[13]	He MC, Ren FQ, Liu DQ (2018a) Rockburst mechanism research and its control. Int J Min Sci Technol 28(5):829–837
[14]	He YX, Pang YX, Zhang Q, Jiao Z, Chen Q (2018b) Comprehensive evaluation of regional clean energy development levels based on principal component analysis and rough set theory. Renew Energy 122:643–653
[15]	He SQ, Song DZ, Li ZL, He XQ, Chen JQ, Li DH, Tian XH (2019) Precursor of spatio-temporal evolution law of MS and AE activities for rock burst warning in steeply inclined and extremely thick coal seams under caving mining conditions. Rock Mech Rock Eng 52:2415–2435
[16]	Huang L, Xu YC, Liu SQ, Gai QK, Miao W, Li YB, Zhao LS (2022a) Research on the development law of pre-mining microseisms and risk assessment of floor water inrush: a case study of the Wutongzhuang coal mine in China. Sustainability 14(15):9774
[17]	Huang YP, Yan L, Cheng Y, Qi XM, Li ZX (2022b) Coal thickness prediction method based on vmd and lstm. Electronics 11(2):232
[18]	Li B, Xu NW, Xiao PW, Xia Y, Zhou X, Gu GK, Yang XG (2023) Microseismic monitoring and forecasting of dynamic disasters in underground hydropower projects in southwest China: a review. J Rock Mech Geotech Eng 15(8):2158–2177
[19]	Liu YQ, Cao AY, Wang CB, Yang X, Wang Q, Bai XX (2023) Cluster analysis of moment tensor solutions and its application to rockburst risk assessment in underground coal mines. Rock Mech Rock Eng 56:6709–6734
[20]	Liu JF, He X, Huang HY, Yang JX, Dai JJ, Shi XC, Xue FJ, Rabczuk T (2024a) Predicting gas flow rate in fractured shale reservoirs using discrete fracture model and GA-BP neural network method. Eng Anal Boundary Elem 159:315–330
[21]	Liu JF, Qiu XS, Yang JX, Liang C, Dai JJ, Bian Y (2024b) Failure transition of shear-to-dilation band of rock salt under triaxial stresses. J Rock Mech Geotech Eng 16(1):56–64
[22]	Lu CP, Dou LM, Wu XR, Wang HM, Qin YH (2005) Frequency spectrum analysis on microseismic monitoring and signal differentiation of rock material. Chin J Geotech Eng 27(7):772–775
[23]	Lurka A (2021) Spatio-temporal hierarchical cluster analysis of mining-induced seismicity in coal mines using ward’s minimum variance method. J Appl Geophys 184:104249
[24]	Ma TH, Tang CA, Tang SB, Kuang L, Yu Q, Kong DQ, Zhu X (2018) Rockburst mechanism and prediction based on microseismic monitoring. Int J Rock Mech Min Sci 110:177–188
[25]	Majid K, He XQ, Song DZ, Tian XH, Li ZL, Xue YR, Khurram SA (2023) Extracting and predicting rock mechanical behavior based on microseismic spatio-temporal response in an ultra-thick coal seam mine. Rock Mech Rock Eng 56(5):3725–3754
[26]	Malkowski P, Niedbalski Z, Sojka W (2021) The assessment of the optimal time window for prediction of seismic hazard for longwall coal mining: the case study. Acta Geophys 69(3):691–699
[27]	Maxwell SC, Rutledge J, Jones R, Fehler M (2010) Petroleum reservoir characterization using downhole microseismic monitoring. Geophysics 75(5):75A129-75A137
[28]	Mi CN, Zuo JP, Sun YJ, Zhao SK (2022) Investigation on rockburst mechanism due to inclined coal seam combined mining and its control by reducing stress concentration. Nat Resour Res 31(6):3341–3364
[29]	Pu YY, Apel DB, Lingga B (2018) Rockburst prediction in kimberlite using decision tree with incomplete data. J Sustain Min 17(3):158–165
[30]	Pu YY, Apel DB, Liu V, Mitri H (2019) Machine learning methods for rockburst prediction-state-ofthe-art review. Int J Min Sci Technol 29(4):565–570
[31]	Shu LY, Liu ZS, Wang K, Zhu NN, Yang J (2022) Characteristics and classification of microseismic signals in heading face of coal mine: implication for coal and gas outburst warning. Rock Mech Rock Eng 55(11):6905–6919
[32]	Si GY, Cai W, Wang SY, Li X (2020) Prediction of relatively high-energy seismic events using spatial– temporal parametrisation of mining-induced seismicity. Rock Mech Rock Eng 53:5111–5132
[33]	Soleimani F, Si GY, Roshan H, Zhang J (2023) Numerical modelling of gas outburst from coal: a review from control parameters to the initiation process. Int J Coal Sci Technol 10(1):81. https://doi.org/10.1007/s40789-023-00657-7
[34]	Wang CB, Si GY, Zhang CG, Cao AY, Canbulat I (2021) Location error based seismic cluster analysis and its application to burst damage assessment in underground coal mines. Int J Rock Mech Min Sci 143:104784
[35]	Wang SY, Si GY, Wang CB, Cai W, Li BL, Oh J, Canbulat I (2022) Quantitative assessment of the spatio-temporal correlations of seismic events induced by longwall coal mining. J Rock Mech Geotech Eng 14(5):1406–1420
[36]	Wang CP, Liu JF, Chen L, Liu J, Wang L, Liao YL (2024) Creep constitutive model considering nonlinear creep degradation of fractured rock. Int J Min Sci Technol 34(1):105–116
[37]	Wu Y, Lin YZ, Zhou Z, Bolton DC, Liu J, Johnson P (2018) Deepdetect: a cascaded region-based densely connected network for seismic event detection. IEEE Trans Geosci Remote Sens 57(1):62–75
[38]	Xu D, Liu JF, Liang C, Yang JX, Xu HN, Wang L, Liu J (2024) Effects of cyclic fatigue loads on surface topography evolution and hydro-mechanical properties in natural and artificial fracture. Eng Fail Anal 156:107801
[39]	Xu LJ, Fan CJ, Luo MK, Li S, Han J, Fu X, Xiao B (2023) Elimination mechanism of coal and gas outburst based on geo-dynamic system with stress–damage–seepage interactions. Int J Coal Sci Technol 10(1):74. https://doi.org/10.1007/s40789-023-00651-z
[40]	Zhang CG, Canbulat I, Hebblewhite B, Ward CR (2017) Assessing coal burst phenomena in mining and insights into directions for future research. Int J Coal Geol 179:28–44
[41]	Zhang C, Jin GH, Liu C, Li SG, Xue JH, Cheng RH, Wang XL, Zeng XZ (2021) Prediction of rockbursts in a typical island working face of a coal mine through microseismic monitoring technology. Tunn Undergr Space Technol 113:103972
[42]	Zhao YX, Jiang YD (2010) Acoustic emission and thermal infrared precursors associated with bump-prone coal failure. Int J Coal Geol 83(1):11–20
[43]	Zhao YS, Qi Q, Li JN, Zhao Z, Li BL (2023) Experimental investigations on the failure characteristics of brittle sandstone containing various heights of rectangle cavities under biaxial loading. Theoret Appl Fract Mech 124:103804

Funding

Key Technologies Research and Development Program, 2022YFC3004603, Anye Cao, Jiangsu Province International Collaboration Program-Key National Industrial Technology Research and Development Cooperation Projects, BZ2023050, Anye Cao, Natural Science Foundation of Jiangsu Province, BK20221109, Xu Yang, National Natural Science Foundation of China, 52274098, Anye Cao

About this article

Cite this article

Yang, X., Liu, Y., Cao, A. et al. Coal burst spatio-temporal prediction method based on bidirectional long short-term memory network.Int J Coal Sci Technol 12, 11 (2025).

https://doi.org/10.1007/s40789-025-00759-4

Received

15 November 2023
Revised

30 January 2024
Accepted

15 January 2025
Issue Date

November -0001
DOI

https://doi.org/10.1007/s40789-025-00759-4
Share this article

Copy to clipboard

About issue

Editors-in-Chief

Managing Editor

Associate Editors

Publishing model

Coal burst spatio-temporal prediction method based on bidirectional long short-term memory network

Abstract

1.Introduction

2.Overview of method

3.Detailed design

3.1 Spatio-temporal characteristic indicators construction

3.2 Temporal prediction model

3.2.1 Data preprocessing

3.2.2 Temporal prediction model construction

3.3 Spatial prediction model

4.Method test and engineering application

4.1 Site characteristics

4.2 Evaluation of temporal prediction method

4.2.1 Temporal prediction model training

4.2.2 Evaluation indicators

4.2.3 The impact of different indicators

4.2.3.1 Determine the optimal prediction time \(N\)

4.2.3.2 Determine the optimal prediction time window \(I\)

4.2.3.3 Evaluate the influence of different indicators

4.2.4 Temporal prediction in engineering application

4.3 Evaluation of spatial prediction method

5.Concusions

References

Funding

About this article

Cite this article

Share this article

Keywords

For Authors

Explore

Label	Maximum energy	Risk level
0	\(e_{\max }\) < 1 × 10³ J	No
1	1 × 10³ J ≤ \(e_{\max }\) < 1 × 10⁴ J	Weak
2	1 × 10⁴ J ≤ \(e_{\max }\) < 1 × 10⁵ J	Medium
3	\(e_{\max }\) ≥ 1 × 10⁵ J	Strong

Microseismic spatio-temporal characteristic indictaors	Risk level
0 ≤ \(\rho\) < 0.2	No
0.2 ≤ \(\rho\) < 0.45	Weak
0.45 ≤ \(\rho\) < 0.7	Medium
0.7 ≤ \(\rho\) ≤ 1	Strong

Parameter	The meaning of parameter
\(I\)	Time window (hour)
\(P\)	Precursor pattern sequence length
\(s\)	Precursor pattern sequence sampling step size
\(N\)	Prediction time (hour)