Ensemble learning model for predicting coal and gas outbursts based on high-dimensional small samples
-
Graphical Abstract
-
Abstract
Coal and gas outburst prediction data are characterized by high dimensionality and small sample sizes, posing significant challenges to predictive modeling. To address this issue, this study constructed a database of 60 samples comprising seven indicators, including gas pressure, gas content, and coal failure type. The permutation importance method was used for feature dimensionality reduction, selecting five key features (initial velocity of gas emission, coal seam thickness, gas content, gas pressure and coal sturdiness coefficient) to mitigate the impact of weakly correlated features on prediction modeling. A Stacking ensemble model was developed using support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), logistic regression (LR) and extreme gradient boosting (XGBoost) as base learners and XGBoost as the meta-learner. Bayesian Optimization (BO) was applied for global hyperparameter tuning, resulting in a BO-Stacking ensemble model for coal and gas outburst prediction. The shapely additive explanations (SHAP) method was employed for interpretability analysis of the model's predictions. The results show that the BO-Stacking model, after feature reduction, achieved an accuracy of 92.4 %, an F1 score of 0.956, a Kappa coefficient of 0.927, and an AUC value of 0.969, outperforming all individual models. The ranking of feature importance was initial velocity of gas emission > gas content > gas pressure > coal sturdiness coefficient > coal seam thickness. The BO-Stacking ensemble learning model demonstrates strong predictive performance and stability, providing a novel approach for coal and gas outburst prediction.
-
-